Patterns not just Data

Download Report

Transcript Patterns not just Data

Patterns not just Data
• Information overload which escalates beyond any of our
traditional beliefs.
• “The world produces between 1 and 2 exabytes of unique
information per year, which is roughly 250 megabytes for
every man, woman, and child on earth.” [P. Lyman and H.R.
Varian, "How Much Information", 2000. Retrieved from
http://www.sims.berkeley.edu/how-much-info on January 2002]
• Still, even novel DBMS architectures are insufficient to
cover the gap between the exponential growth of data and
the slow growth of our understanding [Gray02], due to our
methodological bottlenecks and simple human limitations.
Lowell 2003
Timos Sellis
1
Patterns not just Data
• To compensate for these shortcomings, we reduce the
available data to knowledge artifacts (i.e., clusters, rules,
etc.) through data processing methods (pattern recognition,
data mining, knowledge extraction
• This reduces their number and size (so that they are
manageable from humans) while preserving as much as
possible from their hidden/interesting/available
information.
• These knowledge artifacts are patterns. Patterns can in
general be distinguished with respect to how they are
constructed and what they are used for.
Lowell 2003
Timos Sellis
2
Patterns not just Data - Applications
• Data Mining
– Clusters, Classifications, Assoc. Rules, Time-Series
• Signal Processing
– Music, Voice, Vision
• Information Retrieval
– Corpus
• Mathematical applications
– Graphs, numbers, Cryptography
• You can name more…..
Lowell 2003
Timos Sellis
3
Patterns not just Data – The
Challenge
• Can we find a universal model that allows
modelling patterns in general?
• What would a query language for patterns look
like?
• What would be the essential “new” system
components (indexing, visualization, etc)?
• Can such systems be built on top of ORDBMS?
Lowell 2003
Timos Sellis
4
Approximate Data/Answers
• In most real, big, applications approximations are
the only solution.
• At the same time, the user needs to know the
quality of the approximations, at the information
level as well as at the answer level
• Support must be provided by the DBMS at all
levels: models, query languages, indexes, physical
storage, visualization of results
Lowell 2003
Timos Sellis
5
Approximate Data/Answers– The
Challenge
• Scalable approximation schemes (histograms,
wavelets, etc)
• Learning out of the tolerance a user can show to
approximate answers deemed as acceptable
• What is an approximation of an XML document?
How much schema/ontology information is
required?
• Approximation may change according to the
context of a user query; how is this taken under
account?
Lowell 2003
Timos Sellis
6