Transcript Document
An Application of Text Mining in
Strategic Technical Planning
Paul Frey – Search Technology, Inc.
Nils Newman – Intelligent Information Services Corp.
Robert Watts – U.S. Army TACOM
Alan Porter – Georgia Institute of Technology
Symposium on Technical Intelligence
Division of Chemical Information
American Chemical Society National Meeting
San Diego California, April 4, 2001
Preliminary Comments
• A word of caution …
– I am one of those people that previous speakers warned
you about, because …
– I make and sell “hammers.”
– a.k.a. software tools to assist (not replace!) a skilled
Technical Intelligence analyst
• Work reported occurred in late 1996
– Using an earlier prototype of VantagePoint
– Screen-shots in this presentation are re-creations
– But the story is true
Statement of the Problem
Domain: U.S. Army Tank-automotive and
Armaments Command (TACOM)
• Replacement of high-cost, out-of-tolerance
engine components is expensive
• Potential Solution: Recondition worn-out
components using thermal-spray coatings
• Institutional Barrier: Related R&D program
in early 1980’s was “too early”
Innovation Forecasting
• Search on basic topical terms in multiple data
sources
• Examine results to refine query and re-do search
• Plot major trends and model the life cycle
• Identify qualitative categories for assessment of
technology life cycle (e.g., academic and corporate
research)
• Slice data by time and compare
• Identify and analyze special areas (e.g., gap
analysis)
Initial Query
• EI Compendex
• “Ceramic” NEAR “Engine” (and variants)
• ~800 records
Examine Publication Trend by
Industry Segment (Overview)
1. Import the raw data (Import Filters/Editor)
2. Extract Year of Publication from a coded
field (Thesaurus)
3. Clean up “Corporate Source” field
4. Assign industry segment in “Corporate
Source” field (Thesaurus Groups)
5. Create a co-occurrence matrix
6. Plot the trend (VBScript MS Excel)
Extract Publication Year
• Capture raw data
• Apply a thesaurus to
condense the raw data
User-managed thesauri are based on “Regular Expressions”
Clean-up Corporate Source Field
• Automatic
• with Manual
confirmation
– Optional, but
recommended
• Save and/or
merge clean-up
operations into a
thesaurus for reuse
Categorize Corporate Sources
• Create groups using
a thesaurus
– Corporation
– Laboratory
– University
Cumulative Publication Trends
• Co-occurrence
matrix
• Plot in Excel
using VBScript
So …
• It looks like something significant happened
in the mid-80’s to radically increase the
publication rate.
• It also looks like things have slowed down
in (then) recent years (early- to mid-90’s).
• Has the technology has matured?
• Is it ready to transition?
Analyze Selected Areas
Forecasting – Counting Patents
• Plot
• Model
(Coeff. Det.
> 0.95)
• Predict
“significant
technology
growth in the
next 9 years”
• Monitor
(Leap to today)
U.S. Patents – Ceramic Coating (cumulative)
Look for existing forecasts
• Create groups of
records using a
thesaurus
(e.g. “forecast”
“projection”
“future”
“market study”)
• Made “leap” to
electronics
PIEZOELECTRIC DEVICES - Manufacture
industry
• Identified expert
• The rest is history
Results
• 1997
– Prepared case for new effort on reconditioning engine
components using thermal-spray coatings
– U.S. Army invested $2M in 5-year R&D program
• Today
– Installing reconditioning equipment at Red River Army
Depot for production use
– Expected Near-term payback ~ $5.5M
References and Contact
Information
• “Innovation Forecasting” by Robert J. Watts and
Alan L. Porter, Technological Forecasting and
Social Change, 56, 25-47 (1997).
• Paul Frey, Search Technology, Inc., Atlanta, GA
– [email protected]
– 770.441.1457
• Web Sites
– www.searchtech.com
– www.TheVantagePoint.com