The application of commercial visualization tools to the

Download Report

Transcript The application of commercial visualization tools to the

Applying
Existing
Technology to
Exploitation
of Multiple
Sources of
Information
Mike Brenton
Sterling Software
Memex Technology Limited
Problem
Statement
• First there was data overload.
• Now there is an over
abundance of tool power.
Information
Types
Sterling Software Announces 2-For-1 Stock Split
DALLAS, Texas (March 11, 1998) - Sterling Software, Inc. (SSW-NYSE) today announced that
its Board of Directors has approved a 2-for-1 split of the company’s common stock. Stockholders
will receive one additional common share for every share held on the record date of March 20,
1998. The additional shares will be issued on April 3, 1998.
Sterling Software currently has approximately First
38.8Name
millionLast
shares
of common
Name
Project stock
Phoneoutstanding.
# Office #
This number will double to approximately 77.6Michael
million shares
by
reason
of
the
stock split. 249
Brenton
MDITDS x7060
Kornegay
Harold
MDITDS x7049
221
Sterling L. Williams, president and chief executive officer of Sterling Software commented,
"Sterling Software’s stock price increased 30% during 1997 and 28% so far this year, based on
consistently excellent performance by the company. We decided to split our stock to improve its
trading
andDefense
to help ensure
that it trades in a price range that is accessible to a broad base
The liquidity
Migration
Intelligence
of Threat
investors."
Data System (MDITDS) is a
Department of Defense Intelligence
Sterling Software is a leading provider of software and services for the applications management,
Information
Systems
(DODIIS)
systems
management
and federal
systems markets. Sterling Software, with its headquarters in
designated
migration
system
tasked
to than 20,000 customer sites and has 3,100
Dallas,
has a worldwide
installed
base
of more
provideinthe
production
employees
85automated
offices worldwide.
For more information on Sterling Software, visit the
company’s
Webthe
siteDODIIS
at http://www.sterling.com.
system for
Indications and
Warnings (I&W), Counterintelligence
Contact:
(CI), Anti-terrorism (AT),
Julie Kupp
Counterterrorism
(CT), Information
Vice President, Investor
Relations
Warfare
(IW),
Arms
Proliferation
(AP),
Sterling Software, Inc.
(214) 981-1000
and [email protected]
Defense Industry (DI) communities.
©Copyright Sterling Software, 1998 All rights reserved
Open Source
Materials
• Electronic Information
–
–
–
–
–
–
Library Services
On-line Newspapers
On-line Reports
Information Brokers
CD-ROM Products
Wire Services
• Agents
– Services - People
– Services - Push and Punch
– Spiders, Crawlers, and Profilers
Tools
• Data Warehouses
• Concept Analysis and
Summarization
• Vectors, Clustering,
Histograms
• Data Mining
• OLAP
• Statistical Analysis
• Visualization
• Information Extraction
• Temporal Analysis
• Link Analysis
Data
Warehouses
• Data warehousing is an
emerging technology that
supports non-operational
application areas like
management information
systems, decision support, and
data mining.
• A data warehouse is a database
that provides efficient and
integrated access to relevant
analytical data.
Department of Information Science - The Aarhus
School of Business
Memex
Information
Engine
and
Client
Applications
Memex Network Query Tool
File Edit View Insert
Memex Network Query Tool
File Edit
4 -- View
65% Insert
-- Munitions
on Tactical Fighters
6 -- 50% -- Smart Munitions
7 -- 45% -- Air Dropped Land Mines
Text Search:
Products
Memex Network Query
2 --Tool
85% -- UK Air Power and•Country
NATO
Profiles
•
Group
(Unit)
File Edit View Insert 1 -- 90% -- Air Power Over Bosnia Profiles
•
Individual Profiles
•
Incidents (Events)
-- 70% -- Air Power Assessment
Field 3Search:
•
Misc. Assessments
Name:
__________________________
•
All
1 -- 90% -- Air
Power
Over
Bosnia Fighters and
5 -- 50%
-- Tactical
LSB
Incident Type: ____________________
Domains
2 -- 85% -- Organization
UK Air Power
and NATO
Type: ________________
•
Counter Intelligence
Equipment: ______________________
•
Counter Terrorism
3 -- 70% -- Air
Power Assessment
Start Date:_________ Stop
•
Force Protection
•
Arms Proliferation
4 -- 65% -- Date:_________
Munitions on Tactical Fighters
•
Defense Industries
•
Indications & Warning
5 -- 50% -- Tactical Fighters and LSB
•
All
6 -- 50% -- Smart Munitions
7 -- 45% -- Air Dropped Land Mines
Network
DIA
EUCOM
JICPAC
SOUTHCOM
CENTCOM
STRATCOM
SPACECOM
TRANSCOM
Concept
Analysis
and
Summarization
• Concept analysis is the process
of matching keywords in the
text to hierarchical topic trees
in order to determine the major
theme(s) in the document,
paragraph, or sentence.
• Some systems use this
information and predetermined
“templates” to build summaries
of a document.
• The concepts and summaries
are then used to route
documents to analysts.
Vectors,
Clustering,
and
Histograms
• Document clustering is a
technique for automatically
discovering the subtopics in a
set of documents and grouping
the documents by those
subtopics.
• Organizing documents by
subtopic can help you get a
sense of the major subject areas
covered in the document set…
Verity, Inc.
Data Mining
• Data mining is the analysis of
data for relationships that have
not previously been discovered.
• For example, the sales records
for a particular brand of tennis
racket might, if sufficiently
analyzed and related to other
market data, reveal a seasonal
correlation with the purchase
by the same parties of golf
equipment.
whatis.com Inc.
OLAP
• OLAP (online analytical
processing) enables a user to easily
and selectively extract and view
data from different points-of-view.
• For example, display a spreadsheet
showing all of a company's beach
ball products sold in Florida in the
month of July, 1997, then compare
revenue figures with those for the
same products in July, 1996, and
then etc.
whatis.com Inc.
Statistical
Analysis
• The collection, classification,
and interpretation of numerical
data.
• Elements of statistics are
present in most OLAP tool sets.
• Functions include: Frequency
Distribution, Average, Mean,
Standard, Deviations, etc.
• Functions found in most
spreadsheet applications.
Visualization
• Visualization is the process of
representing abstract business
or scientific data as images that
can aid in understanding the
meaning of the data.
• Visual computing is computing
that lets you interact with and
control work by through
visualization.
whatis.com Inc.
Information
Extraction
• Automated information
extraction involves the
identification and extraction of
information about specified
classes of events and the filling
of templates for each instance
of such an event.
• Operates against pure text.
• Also known as NLU or NLP.
• Naval Research and Development
group (NRaD) of NOSC
Temporal
Analysis
• Temporal analysis is the
process of evaluating
information, events and
activities in light of models
which encompass the concept
of time or sequence and time.
• Model sequences incorporate a
timeframe constraint on the
identified events.
Link
Analysis
• Link analysis provided the
ability to investigate
relationships between people,
places, events, and things.
• Ideally, it is a mechanism to
“walk through” a data
warehouse following those
links which have meaning
relevant to the immediate
problem.
Tools are nice
but...
• There has to be a reason:
• Analysis of operational data
• Analysis of associated data
• Discovering new relationships
• Discovering new trends
• Gaining new insights into your
business
• Competitive Edge
Different
Tools
for
Different
Kinds of
Discovery
Information
Extraction
• Translating text reports (prose) into
“tagged data”
• Evaluating the tagged data to
extract information
• Commonly referred to as Natural
Language Understanding or
Processing
A Focus on
the Analysis
of Textual
Information
• Typical process flow
– Receipt
– Auto-analysis
• Classification
• Extraction
– Archive
– Visualization
Ten-PlusYear
Repository
Wire
Service
Analyze
Review
Process
Traffic
Receipt
Analyst Queues
Government
Traffic
Ignore
Update
Assessment
Think
Update
Queue
Profiles
Making the
Information
Usable
Sterling Software Announces 2-For-1 Stock Split
DALLAS, Texas (March 11, 1998) - Mr. Sterling Williams of
Sterling Software, Inc. (SSW-NYSE) today announced that the
companies Board of Directors has approved a 2-for-1 split of the
company’s common stock. Stockholders will receive one
additional common share for every share held on the record date
of March 20, 1998. The additional shares will be issued on April
3, 1998.
Sterling Software currently has approximately 38.8 million
shares of common stock outstanding. This number will double to
approximately 77.6 million shares by reason of the stock split.
Org
Group
Group
Location
Object
Object
Date
Date
Date
Event
Event
Sterling Software
Board of Directors
Stockholders
Dallas, Texas
stock shares
stock shares
11-Mar-98
3-Apr-98
20-Mar-98
meeting
stock split
US Corporation
Sterling Software
Sterling Software
38.8 million
77.6 million
Board of Directors
20-Mar-98
Information
Extraction
is not
Information
Retrieval
Information retrieval gets sets of relevant
documents -- you analyze the documents
Information extraction gets facts out of
documents -- you analyze the facts
Natural Language Processing Group, The University of Sheffield
Why is
Information
Extraction
Difficult
• There are many ways of expressing the
same fact:
– BNC Holdings Inc named Ms G Torretta as its new
chairman.
– Nicholas Andrews was succeeded by Gina Torretta
as chairman of BNC Holdings Inc.
– Ms. Gina Torretta took the helm at BNC Holdings
Inc.
• Information may need to be combined
across several sentences:
– After a long boardroom struggle, Mr Andrews
stepped down as chairman of BNC Holdings Inc.
He was succeeded by Ms Torretta.
Natural Language Processing Group, The University of Sheffield
Information
Extraction
(Document)
• Natural Language Understanding
Article
Lexical
Analysis
Reduction
Simple
Relations
Common
Events
Coreference
Records
Domain
Events
Correlation of • The events in a single document
Extracted
are relevant to routing the
Information
document,
(Other
• But a single meeting (event) put
Documents)
in context of other meetings
(events) becomes much more
useful.
• Manual vs. Automated Process
• User interest profiles, e.g.,
– Membership
– Meeting (Communication) Events
– Relocation (Movement) Events
Using
Correlated
Data (Mining
Text (or other)
Databases)
• What would the user do if they
knew how toMonitor
use the
the success
visualizationoftools?
the process and
feed back the
• Automate the
process:
results
into the system.
– Use names of people and
organizations for data mining.
– Use temporal analysis to align
(chronologically) the events.
– Use link analysis to establish
networks of people and things,
e.g., vehicles.
• Present the user with organized
information.
Summary
• Still faced with a tremendous amount of
data.
• Tools are available for acquiring
information relevant to your business.
• Tools to perform data mining over a
substantial data warehouse require a
commitment to:
–
–
–
–
Money
Time
Training
Personnel
• The results are:
Thank you
Mike Brenton
Sterling Software
www.sterling.com
[email protected]
------------Memex Technology Limited
www.memex.co.uk
-----------Jim Basara
Memex, Inc.
[email protected]