SLA_11_2004 - The University of Iowa

Download Report

Transcript SLA_11_2004 - The University of Iowa

Data Mining
David Eichmann
School of Library and Information Science
The University of Iowa
Why?
 Given enough data represented through
enough dimensions, we loose the ability to see
the patterns
How?
 Decision Trees
 Nearest Neighbor Clustering
 Neural Networks
 Rule Induction
 K-Means Clustering
What is it?
 The automated extraction of hidden predictive
information from databases.
 Key points
 Automated
 Hidden
 Predictive
The Typical Process
Evaluation Criteria
 Receiver Operating Characteristic Curves
But Nobody Said We Had To Do
MATH….
Forms of Data
Structured
Databases
Forms
Semi-Structured
Tables on the Web
Bibliographic citations
Graphs & charts
Unstructured
Full text (e.g., journal articles, physician
chart notes)
Images
Text Mining
 Corpus now is a collection of text artifacts
 Full text when you’ve got it (e.g. newswire)
 Metadata when you don’t (e.g. MEDLINE)
 The trick then becomes extracting ‘interesting’
relationships between ‘interesting’ entities
 Who killed who
 Who works for who
 Who makes what
The Classic Entities
 Persons
 Organizations
 Places (Geography)
 Events
A Newswire Example
 APW19981001.0262 [Israel(0.271), Jonathan Pollard (0.153),
Benjamin Netanyahu(0.102), Bill Clinton(0.102), United
States(0.055), ...]
 Persons
 Bill Clinton (3)
 Jonathan Pollard (8)
 Moshe Fogel (2)
 Benjamin Netanyahu (2)
 Israeli Embassy (1)
 Organizations
 Cabinet (1)
 Places
 Israel (16)
 United States (5)
 Washington (2)
In the Medical/Health Realm
 UMLS an excellent framework
 Organism
 Chemical
 Activity
 Disease
A MEDLINE Example
 Document: 89316090 - Reconstructive surgery in Nicaragua
 Provided MeSH Keywords
 Human
 Nicaragua
 Z01.107.169.690
 Surgery, Plastic/*
 G02.403.810.788
 Phrases
 [Reconstructive, surgery]
 [Nicaragua]
 [letter]
 MeSH Terms
 Surgery (1)
 G02.403.810.762
 Letter [Publication Type] (1)
 Other Phrases
Concept Extraction
Example
 “Roman forces under Julius Caesar invade Britain.”
(S (NP (NP Roman forces)
(PP under
(NP Julius Caesar)))
(VP invade
(NP Britain))
.)
 Entity Attributes:
 <organization Roman forces>
 <person Julias Caesar>
 <placename Britain>
 Concepts:
 <Roman forces - under - Julius Caesar>
 <Roman forces - invade - Britain>
And a Small Demo…