Android Permissions Demystified
Download
Report
Transcript Android Permissions Demystified
What Next? A Half-Dozen Data
Management Research
Goals for Big Data and the Cloud
Surajit Chaudhuri
Microsoft Research
PODS 2012
1
About the Author
O Surajit Chaudhuri
O A principal researcher
at Microsoft Research
O Lead the Data Manage-
ment, Exploration and
Mining group
PODS 2012
2
Two Accelerating Trends
O Big Data
O Why Big Data?
O Low cost of acquisition of data
O Low cost of data storage
O Specific characteristics
O Providing additional insight
O Real-time business analytics
O Deep analytics
O Seeking low cost, highly scalable analytics
platforms
PODS 2012
3
Two Accelerating Trends
O Cloud Computing
O Infrastructure as Service
O A rent model of usage that has the benefit of
elasticity
O Software as Service
O Platform as Service
O Enable creation of scalable applications
without having to think in terms of virtual
machines
PODS 2012
4
Six Challenges
O Data Privacy
O Approximate Results
O Data Exploration To Enable Deep Analytics
O Enterprise Data Enrichment With Web And
Social Media
O Query Optimization
O Performance Isolation For Multi-tenancy
PODS 2012
5
Data Privacy
O Exploding use of online services and
proliferation of mobile devices
O Increasing volume and variety of data sets
within an enterprise
O Challenge 1:
O Redefine the abstractions for access control
and auditing for data platforms
PODS 2012
6
Data Privacy
O Three pillars of privacy mechanisms
O Access Control
O Fine-grained VS. Coarse-grain
O Auditing
O Received less attention in research
O Statistical Privacy
O Whether before framework can be reduced to
practice is open to question
PODS 2012
7
Approximate Results
O Challenge 2:
O Devise a querying technique for approximate
results that is an order of magnitude faster
compared to traditional query execution
O Problems
O Simplest semantics could not provide a
uniform random sample
O Efficiently obtaining a sample is nontrivial
PODS 2012
8
Data Exploration To Enable
Deep Analytics
O Challenge 3:
O Build an environment to enable data explora-
tion for deep analytics
O Traditional Machine Learning in Big Data
O Requires understanding of probability and
statistics
O Hard to identify candidate features
PODS 2012
9
Data Exploration To Enable
Deep Analytics
O Difficulties
O Identify relevant fragments
O Data cleaning techniques
O Sample results progressively
O Obtain rich visualization
PODS 2012
10
Enterprise Data Enrichment
With Web And Social Media
O Challenge 4:
O Identify services that given a list of entities
and their properties, returns enrichment of
entities based on information in web and
social media with sufficiently high precision
and recall
PODS 2012
11
Enterprise Data Enrichment
With Web And Social Media
O Unique properties of web data
O Vastness
O Statistical redundancy
O Availability of user feedback
O Key focus
O High precision
O Good recall
PODS 2012
12
Query Optimization
O Challenge 5:
O Rethink query optimization for data parallel
platforms
O Problems
O Cost of shuffling data is considerable
O Estimation of sizes of intermediate results
O Can’t create pre-defined statistical summaries
PODS 2012
13
Performance Isolation For
Multi-tenancy
O Challenge 6:
O Define a model of performance SLAs for
multi-tenant data systems that can be
metered at low overhead.
O Develop resource allocation techniques to
support multi-tenancy.
PODS 2012
14
Performance Isolation For
Multi-tenancy
O Performance Isolation
O Avoid interference with other
O No SLA for performance isolation exists
O Service level agreement cloud offered by
service providers
O Problems
O Metering violation of performance SLAs
O Resource allocation
PODS 2012
15
Other Issues
O Scalable Data Platforms
O Data analysis platforms for online, near
online and batch oriented analysis workloads
O Operational Business Intelligence
O Shorten the gap between data acquisition
and business action
O Manageability and Auto-Tuning
O Automated solutions for of manageability
PODS 2012
16
Conclusion
O Research challenges
O Develop infrastructure and tools helping
enterprises identify insight from data assets
O Strong movement towards cloud
infrastructure
O Seize this opportunity to address hard
problems
PODS 2012
17
PODS 2012
18