What Researchers Want - UMD Department of Computer Science

Download Report

Transcript What Researchers Want - UMD Department of Computer Science

Visual analytic tools for monitoring and
understanding the emergence and evolution
of innovations in science & technology
Cody Dunne
Dept. of Computer Science and
Human-Computer Interaction Lab,
University of Maryland
[email protected]
Links from this talk:
bit.ly/stmwant
OECD KNOWINNO Workshop
November 14-15, 2011 Alexandria, VA, USA
1
Outline
1.
2.
3.
4.
5.
Academic literature exploration
Case study: Tree visualization techniques
Case study: Business intelligence news
Case study: Pennsylvania innovations
STICK approach
2
1. Academic literature exploration
Users are looking for:
1. Foundations
2. Emerging research topics
3. State of the art/open problems
4. Collaborations & relationships between
Communities
5. Field evolution
6. Easily understandable surveys
3
Action Science Explorer
4
User requirements
• Control over the paper collection
– Choose custom subset via query, then iteratively drill
down, filter, & refine
• Overview either as visualization or text statistics
– Orient within subset
• Easy to understand metrics for identifying
interesting papers
– Ranking & filtering
• Create groups & annotate with findings
– Organize discovery process
– Share results
5
Action Science Explorer
• Bibliometric lexical link mining to create a
citation network and citation context
• Network clustering and multi-document
summarization to extract key points
• Potent network analysis and visualization tools
www.cs.umd.edu/hcil/ase
6
2. Case study: Tree visualization
• Problem: Traditional 2D node-link diagrams of
trees become too large
• Solutions:
– Treemaps: Nested Rectangles
– Cone Trees: 3D Interactive Animations
– Hyperbolic Trees: Focus + Context
• Measures:
– Papers, articles, patents, citations,…
– Press releases, blog posts, tweets,…
– Users, downloads, sales,…
7
Treemaps: nested rectangles
www.cs.umd.edu/hcil/treemap-history
8
Smartmoney MarketMap Feb 27, 2007
smartmoney.com/marketmap
9
Cone trees: 3D interactive animations
Robertson, G. G., Card, S. K., and Mackinlay, J. D., Information visualization using 3D interactive animation,
Communications of the ACM, 36, 4 (1993), 51-71.
Robertson, G. G., Mackinlay, J. D., and Card, S. K., Cone trees: Animated 3D visualizations of hierarchical information,
10
Proc. ACM SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York, (April 1991), 189-194.
Hyperbolic trees: focus & context
Lamping, J. and Rao, R., Laying out and visualizing large trees using a hyper-bolic space, Proc. 7th Annual ACM
symposium on User Interface Software and Technology, ACM Press, New York (1994), 13-14.
Lamping, J., Rao, R., and Pirolli, P., A focus+context technique based on hy-perbolic geometry for visualizing large
11
hierarchies, Proc. SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York (1995), 401-408.
TM=Treemaps
CT=Cone Trees
HT=Hyperbolic Trees
Patents
Academic
Papers
Trade Press
Articles
Tree visualization publishing
12
TM=Treemaps
CT=Cone Trees
HT=Hyperbolic Trees
Patents
Academic
Papers
Tree visualization citations
13
Insights
• Emerging ideas may benefit from open access
• Compelling demonstrations with familiar
applications help
• Many components to commercial success
• 2D visualizations w/spatial stability successful
• Term disambiguation & data cleaning are hard
Shneiderman, B., Dunne, C., Sharma, P. & Wang, P. (2011), "Innovation trajectories
for information visualizations: Comparing treemaps, cone trees, and hyperbolic
trees", Information Visualization.
http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-16
14
3. Case study: Business intelligence news
Proquest 2000-2009
Term
hyperion
Frequency
3122
Term
Frequency
decision support system
39
data mining
889
business process reengineering
36
business intelligence
434
data mart
29
knowledge mgmt.
221
business analytics
21
data warehouse
207
text mining
19
data warehousing
139
predictive analytics
18
cognos
112
business performance mgmt
6
competitive intelligence
86
online analytical processing
5
electronic data itrch.
69
knowledge discovery in database
1
meta data
69
ad hoc query
1
15
PQ Business Intelligence 2000-2009
Co-occurrence of concepts with organizations
Data Mining
Frequency
•
•
•
•
•
•
•
•
•
Year
National Security Agency
NSA
White House
FBI
AT&T
American Civil Liberties Union
Electronic Frontier Foundation
Dept. of Homeland Security
CIA
Business
Intelligence
2000-2009
Matrix
showing CoOccurrence
of concepts
and orgs.
18
Business
Intelligence
2000-2009:
(subset)
19
Business
Intelligence
2000-2009:
Data mining
•
•
•
•
•
•
•
•
•
•
•
NSA
CIA
FBI
White House
Pentagon
DOD
DHS
AT&T
ACLU
EFF
Senate Judiciar
Committee
20
Business
Intelligence
2000-2009:
Tech1
•
•
•
•
Google
Yahoo
Stanford
Apple
Tech2
• IBM, Cognos
• Microsoft
• Oracle
Finance
•
•
•
•
•
NASDAQ
NYSE
SEC
NCR
MicroStrategy
21
Business
Intelligence
2000-2009:
•
•
•
•
•
Air Force
Army
Navy
GSA
UMD*
22
Insights
• Useful groupings in PQ BI terms based on
events and long-term collaborators
• Interactive line charts useful for looking at cooccurrence relationships over time
• Clustered heatmaps useful for overall cooccurrence relationships
stick.ischool.umd.edu
23
4. Case study: Pennsylvania innovations
• Innovation relationships during 1990
– State & federal funding
– Patents (both strong and weak ties)
– Location
• Connecting
– State & federal agencies
– Universities
– Firms
– Inventors
24
Patent
Tech
SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Patent
Tech
SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
No Location
Philadelphia
Patent
Tech
Navy
SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
Pharmaceutical/Medical
Pittsburgh Metro
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
Westinghouse Electric
19: Other states
No Location
Philadelphia
Patent
Tech
Navy
SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
Pharmaceutical/Medical
Pittsburgh Metro
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
Westinghouse Electric
19: Other states
Insights
• Meta-layouts useful for showing:
– Groups (clusters, attributes, manual)
– Relationships between them
• User comments
– “We've never been able to see anything like this“
– “This is going to be huge"
www.terpconnect.umd.edu/~dempy/
29
5. STICK approach
• NSF SciSIP Program
– Science of Science & Innovation Policy
– Goal: Scientific approach to science policy
• The STICK Project
– Science & Technology Innovation Concept
Knowledge-base
– Goal: Monitoring, Understanding, and Advancing
the (R)Evolution of Science & Technology
Innovations
STICK approach cont…
• Scientific, data-driven way to track innovations
– Vs. current expert-based, time consuming
approaches (e.g., Gartner’s Hype Cycle, tire track
diagrams)
• Includes both concept and product forms
– Study relationships between
• Study the innovation ecosystem
– Organizations & people
– Both those producing & using innovations
stick.ischool.umd.edu
31
STICK Process (overview)
• Identify concepts
•
Business intelligence, cloud
computing, customer
relationship management,
health IT, web 2.0, electronic
health records, biotech
• Query data sources
• Processing
•
•
•
Automatic entity recognition
Crowd-sourced verification
Co-occurrence networks
• Visualizing & analyzing
•
•
•
• News
• Dissertation
• Academic
• Patent
• Blogs
Overall statistics
Co-occurrence networks
Network evolution
• Sharing results
32
Process
1.
2.
3.
4.
Collecting
Processing
Visualizing & Analyzing
Collaborating
Cleaning
Collecting
Identify Concepts
• Begin with target concepts
–
–
–
–
Business Intelligence
Health IT
Cloud Computing
Customer Relationship
Management
– Web 2.0
– Personal Health Records
– Nanotechnology
• Develop 20-30 sub concepts
from domain experts, wikis
Data Sources
• News
• Dissertation
• Academic
• Patent
• Blogs
Collecting (2)
• Form & Expand Queries
ABS(
"customer relationship management" OR
"customers relationship management" OR
"customer relation management"
) OR TEXT(…) OR SUB(…) OR TI(…)
• Scrape Results
Processing
Automatic Entity Recognition
• BBN IdentiFinder
Crowd-Sourced Verification
• Extract most frequent 25%
• Assign to CrowdFlower
– Workers check organization
names and sample sentences
Processing (2)
• Compute Co-Occurrence Networks
– Overall edge weights
– Slice by time to see network evolution
• Output
CSV
GraphML
Visualizing & Analyzing
Spotfire
• Import CSV, Database
• Standard charts
• Multiple coordinated views
• Highly scalable
NodeXL
• CSV, Spigots, GraphML
• Automate feature
– Batch analysis & visualization
• Excel 2007/2010 template
Shared data & analysis repositories
• Online Research Community
• Share data, tools, results
– Data & analysis downloads
– Spotfire Web Player
• Communication
• Co-creation, co-authoring
stick.ischool.umd.edu/community
39
Ongoing Work
Collecting:
Additional data sources and queries
Processing:
Improving entity recognition accuracy
Visualizing &
Analyzing:
Visualizing network evolution
• Co-occurrence network sliced by time
Collaborating: Develop the STICK Open Community site
• Motivate user participation
• Improve the resources available
• Invitation-only testing
Outline
1. Academic literature exploration
–
Citation networks and text summarization
2. Case study: Tree visualization techniques
–
Papers, patents, and trade press articles
3. Case study: Business intelligence news
–
News term co-occurrence
4. Case study: Pennsylvania innovations
–
Patents, funding, and locations
5. STICK approach
–
Tracking innovations across papers, patents, news
articles, and blog posts
41
Take Away Messages
• Easier scientific, data-driven innovation analysis:
– Automatic collection & processing of innovation data
– Easy access to visual analytic tools for finding clusters,
trends, outliers
– Communities for sharing data, tools, & results
Visual analytic tools for monitoring and
understanding the emergence and evolution
of innovations in science & technology
Cody Dunne
Dept. of Computer Science and
Human-Computer Interaction Lab,
University of Maryland
[email protected]
This work has been partially supported by
NSF grants IIS 0705832 (ASE) and
SBE 0915645 (STICK)
Links from this talk:
bit.ly/stmwant
43