Taxonomy Development Workshop

Download Report

Transcript Taxonomy Development Workshop

Text Analytics And Text Mining
Best of Text and Data
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Text Analytics Capabilities
 Text Analytics Applications
 Text Mining and Text Analytics
–
Data and Unstructured Content
 Case Study – Text Mining for Taxonomy Development
 Conclusion
2
KAPS Group: General





Knowledge Architecture Professional Services
Virtual Company: Network of consultants – 8-10
Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc.
Consulting, Strategy, Knowledge architecture audit
Services:
– Text Analytics evaluation, development, consulting, customization
– Knowledge Representation – taxonomy, ontology, Prototype
– Metadata standards and implementation
– Knowledge Management: Collaboration, Expertise, e-learning
– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
3
Introduction to Text Analytics
Text Analytics Features
 Noun Phrase Extraction
–
Catalogs with variants, rule based dynamic
– Multiple types, custom classes – entities, concepts, events
– Feeds facets
 Summarization
–
Customizable rules, map to different content
 Fact Extraction
Relationships of entities – people-organizations-activities
– Ontologies – triples, RDF, etc.
–
 Sentiment Analysis
–
Statistical, rules – full categorization set of operators
4
Introduction to Text Analytics
Text Analytics Features
 Auto-categorization
Training sets – Bayesian, Vector space
– Terms – literal strings, stemming, dictionary of related terms
– Rules – simple – position in text (Title, body, url)
– Semantic Network – Predefined relationships, sets of rules
– Boolean– Full search syntax – AND, OR, NOT
– Advanced – NEAR (#), PARAGRAPH, SENTENCE
This is the most difficult to develop
Build on a Taxonomy
Combine with Extraction, Sentiment
Foundation for best text analytics & combination
–




5
6
7
8
9
10
11
Varieties of Taxonomy/ Text Analytics Software
 Taxonomy Management
–
Synaptica, SchemaLogic
 Full Platform
–
SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept
Searching, Expert System, IBM, GATE
 Content Management – embedded
 Embedded – Search
–
FAST, Autonomy, Endeca, Exalead, etc.
 Specialty
Sentiment Analysis , VOC – Lexalytics, Attensity / Reports
– Ontology – extraction, plus ontology
–
12
Text Analytics Applications
Platform for Multiple Applications








Content Aggregation, Duplicate Documents – save millions!
Business intelligence, Customer Intelligence
Social Media - sentiment analysis, Voice of the Customer
Social – Hybrid folksonomy / taxonomy / auto-metadata
Social – expertise, categorize tweets and blogs, reputation
Ontology – travel assistant, semantic web, etc.
eDiscovery, Reputation management, Customer Experience
Expertise Location, Crowd sourcing Technical support
13
Text Analytics Applications:
Enterprise Search - Elements
 Text Analytics can “solve” enterprise search
 Multiple Knowledge Structures
–
–
Facet – orthogonal dimension of metadata
Taxonomy - Subject matter / aboutness
 Software - Search, ECM, auto-categorization, entity
extraction, Text Analytics and Text Mining
 People – tagging, evaluating tags, fine tune rules and
taxonomy
 Rich Search Results – context and conversation
 Platform for search based applications
14
15
16
Text Analytics and Text Mining
Data and Unstructured Content
 80% of content is unstructured – adding to semantic web is major
 Text Analytics – content into data
–
Big Data meets Big Content
 Real integration of text and ontology
– Beyond “hasDescription”
– Improve accuracy of extracted entities, facts – disambiguation
• Pipeline – oil & gas OR research / Ford
– Add Concepts, not just “Things” – 68% want this
 Semantic Web + Text Analytics = real world value
 Linked Data + Text Analytics – best of both worlds
 Build superior foundation elements – taxonomies, categorization
17
Text Analytics and Text Mining and Data Mining
Vaccine Adverse Reaction
 Combine with Data Mining
 New sources of information
 News stories, medical records
 Blogs, social
 Find new connections, sources of knowledge
 Vaccine Adverse Effects – disease, symptoms, variables




Unstructured text into a data source
Some preliminary analysis, content structure
Find unknown adverse effects and prevalence
Drug Discovery + search / research – 5 year story
18
Text Analytics Applications
Example – Vaccine Adverse Effects
19
Text Analytics Applications
Example – Vaccine Adverse Effects
20
Text Analytics Applications
Example – Vaccine Adverse Effects
21
Text Analytics and Text Mining
Case Study – Taxonomy Development








Problem – 200,000 new uncategorized documents
Old taxonomy –need one that reflects change in corpus
Text mining, entity extraction, categorization
Bottom Up- terms in documents – frequency, date,
Clustering – suggested categories
Clustering – chunking for editors
Time savings – only feasible way to scan documents
Quality – important terms, co-occurring terms
22
Text Analytics and Text Mining
Case Study – Taxonomy Development
 Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms
 Add Data: PubDate, journalTitle, Taxonomy Node
 Terms – Map to frequency, date, date ranges, Taxonomy Node
– New Terms, Trends
 Relevance – frequency, Abstract, Title, human judgment
 Entity Extraction – Authors, Organizations, Products,
 Categorization – build on clusters & taxonomy
 Combination – reports, visualizations, interactive explorations
23
Case Study – Taxonomy Development
24
25
26
Case Study – Taxonomy Development
27
Case Study – Taxonomy Development
28
Conclusion
 Text Analytics impact is huge – solve information overload
 Enterprise Search and Search Based Applications: Save millions
and enhance productivity
 Combination of Text Analytics & Text Mining – unlimited range of
applications
 Mutual Enrichment – more data, add structure to unstructured
 Add Ontology = Richer Text Analytics – smarter, more useful
 Text Analytics + Text Mining + Semantic Web
–
Move from theory to new practical applications
 The best is yet to come!
29
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com