Evacuating the Comfort Zone: (Via Curriculum Reform?)

download report

Transcript Evacuating the Comfort Zone: (Via Curriculum Reform?)

Evacuating the Comfort Zone:
(Via Curriculum Reform…)
Comfy Topics
•
•
•
•
•
Logical data models and languages
Query optimization/execution
Consistency models and mechanisms
Storage architectures
Enterprise IT applications
Moving Outside the Zone
• Rethinking system architectures
– Deep memory hierarchies, componentization,
adaptive algorithms, extreme scales (nano to
global)
• Embracing probabilistic reasoning
– In data analysis, adaptive algorithms (again!),
online user interactions, data modeling and
integration, lossy compression
Course 1: Data Systems
• Not so radical: infect the OS course with the RSS
– Traditional OS material (scheduling, protection,
resource management)
– File & Record storage
– Transactions, Concurrency, Recovery
– Storage Hierarchies
– Dataflow architectures: query plans, NW support
• Big pedagogical benefit to merging this material
– Two design targets (OS/FS vs. DBMS)
– Leads to instructive architectural tradeoffs
– Illustrates 2 design philosophies (bottom-up vs. topdown)
Course 2: Modeling & Analysis
Relational + IR + Statistics + Information Theory
• Review of basic math
– 1st-order logic
– Central Limit Theorem, Chernoff/Hoeffding bounds
– Simple information theory: entropy, error metrics
• Data Modeling
– Logical: Relational normalization, ontologies, IR bag-of-words
– Probabilistic: simple graphical models (Bayes nets), IR vector space
• Data Analysis
–
–
–
–
–
Relational-style query optimization/execution, OLAP
Sampling and summarization
Boolean IR, ranked retrieval, link analysis, info extraction
Predictive analyses: classification, clustering
Data Visualization
• Pragmatics, Exercises:
– Decision-support systems & tasks: queries, mining tools, etc.
Assertions
• Course 1 is a Good Idea
– 4 years at the grad level at Berkeley, it works great
• Course 2 is The Future
– It’s in demand
• Think of what Business, BioEng, etc. really want!
• Do our systems students even know how to manage, exploit
experimental data?
– A curriculum, or a research agenda?
• KDD is a piece of this
• But fragmentary
• Opportunity here!
– The DB textbook market is saturated :-)