Taxonomy Development Workshop
Download
Report
Transcript Taxonomy Development Workshop
Text Analytics And Text Mining
Best of Text and Data
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
Text Analytics Capabilities
Text Analytics Applications
Text Mining and Text Analytics
–
Data and Unstructured Content
Case Study – Text Mining for Taxonomy Development
Conclusion
2
KAPS Group: General
Knowledge Architecture Professional Services
Virtual Company: Network of consultants – 8-10
Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc.
Consulting, Strategy, Knowledge architecture audit
Services:
– Text Analytics evaluation, development, consulting, customization
– Knowledge Representation – taxonomy, ontology, Prototype
– Metadata standards and implementation
– Knowledge Management: Collaboration, Expertise, e-learning
– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
3
Introduction to Text Analytics
Text Analytics Features
Noun Phrase Extraction
–
Catalogs with variants, rule based dynamic
– Multiple types, custom classes – entities, concepts, events
– Feeds facets
Summarization
–
Customizable rules, map to different content
Fact Extraction
Relationships of entities – people-organizations-activities
– Ontologies – triples, RDF, etc.
–
Sentiment Analysis
–
Statistical, rules – full categorization set of operators
4
Introduction to Text Analytics
Text Analytics Features
Auto-categorization
Training sets – Bayesian, Vector space
– Terms – literal strings, stemming, dictionary of related terms
– Rules – simple – position in text (Title, body, url)
– Semantic Network – Predefined relationships, sets of rules
– Boolean– Full search syntax – AND, OR, NOT
– Advanced – NEAR (#), PARAGRAPH, SENTENCE
This is the most difficult to develop
Build on a Taxonomy
Combine with Extraction, Sentiment
Foundation for best text analytics & combination
–
5
6
7
8
9
10
11
Varieties of Taxonomy/ Text Analytics Software
Taxonomy Management
–
Synaptica, SchemaLogic
Full Platform
–
SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept
Searching, Expert System, IBM, GATE
Content Management – embedded
Embedded – Search
–
FAST, Autonomy, Endeca, Exalead, etc.
Specialty
Sentiment Analysis , VOC – Lexalytics, Attensity / Reports
– Ontology – extraction, plus ontology
–
12
Text Analytics Applications
Platform for Multiple Applications
Content Aggregation, Duplicate Documents – save millions!
Business intelligence, Customer Intelligence
Social Media - sentiment analysis, Voice of the Customer
Social – Hybrid folksonomy / taxonomy / auto-metadata
Social – expertise, categorize tweets and blogs, reputation
Ontology – travel assistant, semantic web, etc.
eDiscovery, Reputation management, Customer Experience
Expertise Location, Crowd sourcing Technical support
13
Text Analytics Applications:
Enterprise Search - Elements
Text Analytics can “solve” enterprise search
Multiple Knowledge Structures
–
–
Facet – orthogonal dimension of metadata
Taxonomy - Subject matter / aboutness
Software - Search, ECM, auto-categorization, entity
extraction, Text Analytics and Text Mining
People – tagging, evaluating tags, fine tune rules and
taxonomy
Rich Search Results – context and conversation
Platform for search based applications
14
15
16
Text Analytics and Text Mining
Data and Unstructured Content
80% of content is unstructured – adding to semantic web is major
Text Analytics – content into data
–
Big Data meets Big Content
Real integration of text and ontology
– Beyond “hasDescription”
– Improve accuracy of extracted entities, facts – disambiguation
• Pipeline – oil & gas OR research / Ford
– Add Concepts, not just “Things” – 68% want this
Semantic Web + Text Analytics = real world value
Linked Data + Text Analytics – best of both worlds
Build superior foundation elements – taxonomies, categorization
17
Text Analytics and Text Mining and Data Mining
Vaccine Adverse Reaction
Combine with Data Mining
New sources of information
News stories, medical records
Blogs, social
Find new connections, sources of knowledge
Vaccine Adverse Effects – disease, symptoms, variables
Unstructured text into a data source
Some preliminary analysis, content structure
Find unknown adverse effects and prevalence
Drug Discovery + search / research – 5 year story
18
Text Analytics Applications
Example – Vaccine Adverse Effects
19
Text Analytics Applications
Example – Vaccine Adverse Effects
20
Text Analytics Applications
Example – Vaccine Adverse Effects
21
Text Analytics and Text Mining
Case Study – Taxonomy Development
Problem – 200,000 new uncategorized documents
Old taxonomy –need one that reflects change in corpus
Text mining, entity extraction, categorization
Bottom Up- terms in documents – frequency, date,
Clustering – suggested categories
Clustering – chunking for editors
Time savings – only feasible way to scan documents
Quality – important terms, co-occurring terms
22
Text Analytics and Text Mining
Case Study – Taxonomy Development
Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms
Add Data: PubDate, journalTitle, Taxonomy Node
Terms – Map to frequency, date, date ranges, Taxonomy Node
– New Terms, Trends
Relevance – frequency, Abstract, Title, human judgment
Entity Extraction – Authors, Organizations, Products,
Categorization – build on clusters & taxonomy
Combination – reports, visualizations, interactive explorations
23
Case Study – Taxonomy Development
24
25
26
Case Study – Taxonomy Development
27
Case Study – Taxonomy Development
28
Conclusion
Text Analytics impact is huge – solve information overload
Enterprise Search and Search Based Applications: Save millions
and enhance productivity
Combination of Text Analytics & Text Mining – unlimited range of
applications
Mutual Enrichment – more data, add structure to unstructured
Add Ontology = Richer Text Analytics – smarter, more useful
Text Analytics + Text Mining + Semantic Web
–
Move from theory to new practical applications
The best is yet to come!
29
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com