Status xuPA Knowledge Transfer – 06/08/31

Download Report

Transcript Status xuPA Knowledge Transfer – 06/08/31

Mining Structured vs. Unstructured Data
Where is the structure and where did the semantics go?
Rahim Yaseen
SAP Labs LLC.
Why Mining works for structured data..
Reports
Rich semantics are usually
expressed in queries and reports
which have apriori knowledge of the
data models
Queries
For relational databases, the data
model represents a combination of the
data representation specification and
its storage as relational data.
Sometimes, views can express
alternate representational models that
differ from the underlying tables
structures.
Relational Data Model
Data
For relational data

There is no separation of the semantic data model and the logical storage model

Both are co-incident in a single data model and the data definition has limited semantics

The semantics are captured in the richness of the queries which form well known
associations based on expert knowledge of relationships in the data models
 SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 2
internal/confidential
What will it take to mine unstructured data?
Why free (text) search is not the answer..
 The data has no structural model for which meaningful semantics can be applied
 As a result, queries have limited semantics and are not rich enough to get the
desired outcomes
 The limiting nature of ad hoc search (vs. the richness of pre-defined queries
based on known structure/semantics) limits the relevance of the output
Converting unstructured data to structured data is also not the answer..
 Applying an ETL like technique to convert data to a structured form is limiting
 This does not guarantee that all the data of interest can be captured
 It provides for only a single (fixed) interpretation of such unstructured data
Can overlaying a semantic model onto the data be the answer?
 Extract a semantic (meta) model of interest from the unstructured data
 Use the structure/semantics of this model to formulate rich search/query
 E.g., techniques used when searching and comparing products
–
–
Relevant attributes from product descriptions are extracted to form a model
These attributes are used to formulate rich searches/queries and comparisons
 SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 3
internal/confidential
Can Mining work for both structured/unstructured data?
Reports
Queries
Simple Semantic (Meta) Data Model
Multiple Storage Model
Multiple Storage Model
Data Storage Model (s)
Queries and Search that can leverage
the structure of the data model to
specify queries and search that are rich
in semantics
A simple semantic data representation
model for modeling data (structured and
unstructured).
Meta-data based on ontologies is
extracted from the underlying data.
Multiple storage models including;
relational, XML, text, etc.
Data
A separate logical data (meta) model distinct from the underlying storage model





Extracted from the data in a non-intrusive fashion and captured as meta-data
Single data representation model can map to multiple storage models
Structure and semantics of meta-data help structure queries, search, reports
Are embedded tags in the data a possible approach to define ontology structures?
Is it feasible to extract such semantic models and can mining based on this perform?
 SAP AG 2006, xuPA Mid-Term Strategy/ Speaker Name / 4
internal/confidential