Transcript Slide 1

OSIC A Cost-Sharing Approach
to
Open Source Information
Presented by:
Scott Mutton
PM – Security & Intelligence
xwave
13 April 2004
Agenda
•
•
•
•
•
Fly through basic concepts
The OSIC Solution
Functionality of an OSIC
Savings and Benefits
Costs and Risks
2 min
5 min
15 min
5 min
5 min
• Questions as time (Robert) permits
OSINF & OSINT
Basic Concepts
• OSINF = Open Source Information
• Information that can be legally and morally
obtained either freely or by paying a fee
• Some question the “morally” caveat
• OSINT = Open Source Intelligence
• OSINT = OSINF that has been analyzed,
categorized, filtered, or validated through
some intelligence-driven process
Security Implications
Basic Concepts
• By its nature, OSINF is “unclassified”
• Sometimes, though, the fact that a particular
person/topic is even of interest is classified
• OSINT may be classified
• depends on the nature of the analysis,
categorization, etc.
Some Sources of OSINF
Basic Concepts
• Meetings and presentations
• Places of worship
• Books & Plays
• Can be fact or fiction
• Fiction can provide insight into culture
• Newspapers & periodicals
• also flyers, brochures, etc.
•
•
•
•
Movies
Radio & Television
Unclassified reports and studies
Internet
Brief History of Internet Info
Basic Concepts
15 years ago
Estimated to be 100-200 million
articles of data on the Internet
5 years ago
Web added 200-300 million articles
2003
Over 3 billion web pages
Over 400 million images
Estimate
Volume of info on the Internet
doubling every 12-18 months
The OSIC Solution
• Establish an OSINF infrastructure which
• Greatly improves the ability of people to get info from the Internet
• Provides tools to help them understand the info they’ve found
• Organizations in the S & I community
• Share the OSINF infrastructure
• Each do their own processing to generate OSINT
• xwave
• Owns & operates the infrastructure
• Negotiates collective data purchasing agreements
• Conducts ongoing tool reviews and infrastructure improvements
Chain of OSINT Processes
OSIC Solution
Feedback
Distribute
Publish
Mandatory
Analysts
Analyze
Reformat (Optional)
RFI
Info
Helpful
Optional
Librarians
Collate
Collect
Researchers
Access
Note: The process likely becomes classified at the point RFI Info introduced
A Shared OSINF Infrastructure
OSIC Solution
Org A
Org B
Org C
Feedback
Feedback
Feedback
Distribute
Distribute
Distribute
Publish
Publish
Publish
Analyze
Analyze
Analyze
Reformat
Reformat
Reformat
Collate
Collect
Access
Organization
Specific
Functions
Generic
Functions
(Technology
Infrastructure)
Components of OSIC
Functionality
•
•
•
•
•
•
•
•
Federated Search
Focus on specific websites
Access to fee-for-data sites
Multi-lingual
Multi-media
Local OSINF repository
Data Mining
Collaboration
Federated Search
Functionality
• A “federated search engine” is one that
•
•
•
•
•
•
Takes a single user-query
Passes it on to other independent search engines
Collects the returns from each search engine
Removes duplicates
Prioritizes the collection
Presents the user with a single return list
• The OSIC incorporates Copernic Empower
• a commercial federated search engine that can fan out to over
1,000 different Internet search engines
• Can also “log into” fee-for-service sites
• Xerox’s “AskOnce” an alternative under consideration
Focus on Websites
Functionality
• May want to spider and index some sites directly, rather
than relying on commercial search engines
• May want to ignore Robot.txt file
• May want to access sites not indexed by commercial engines
• May want to Data Mine the info
• The OSIC uses Autonomy for indexing, data mining, and
other advanced functionality
• May want to monitor certain sites/pages for changes
• Empower or AskOnce
Fee-for-Data Sites
Functionality
• A great deal of good OSINF needs to be purchased
• BBC World Monitoring (FBIS), Canadian Press, LexisNexis, etc.
• Significant economies of scale for bulk data
• A quote from one data vendor
• 11-30 users costs “X” dollars
• 31-300 users costs “2X” dollars
• 15 users pay 5 times more per-user than 150 users
• Tools exist to allow seats to be “shared” for sites that
require interactive access
•
•
•
•
For example, might share 5 seats across 150 users
Price per shared seat high, but much less than 30 normal seats
The OSIC looking at Tarantella tool to control use
A similar solution in use in Canada’s Foreign Affairs department
Multi-Lingual
Functionality
• A tremendous amount of needed information not
in English
• Need to be able to
• Recognize the language used in a document
• Be able to conduct “native searches”
• E.g. an Arabic query to get Arabic info
• Be able to conduct “cross-language” searches
• E.g. an English query to get Arabic info
• Do “gist” translations from one language to another
Multi-Media
Functionality
• There can be significant benefit in being able to
apply automation to
• Monitor radio or television broadcasts
• Search stored audio/video files for content
• Voice-to-text transcription improving
• A key factor is to be able to maintain link between text
and original audio/video
• Another situation where multi-lingual
functionality needed
Local Repository
Functionality
• Unclassified OSINF/OSINT that is not
posted to the Internet
• Could be product created by OSIC users
• Some Internet docs may be of such lasting
value that they’re worth saving
• Data mining tools can be readily applied
Data Mining
Functionality
• Tools exist to analyze volumes of numeric
and text data and
• Identify trends and clusters
• Allow users to easily “walk through” the data
• A significant benefit to the OSIC is that
currently such tools are expensive and
technically challenging to set up
Collaboration
Functionality
• Simple file sharing
• Organizations connecting to the OSIC add
UNCLASS product to local repository
• Connecting users with questions to users
with answers
• Possibly some vehicle for chat
• Likely security issues
Components of OSIC
Functionality - Summary
Federated Search
Empower, AskOnce
Focus on specific websites
Empower, AskOnce, Autonomy
Access fee-for-data sites
Empower, AskOnce, Tarantella
Multi-lingual
Empower, AskOnce, Autonomy
Multi-media
Autonomy
Local OSINF repository
Simple files for now
Data Mining
Autonomy
Collaboration
Autonomy (partial)
Savings and Benefits
•
•
•
•
Cost sharing across multiple user groups
Savings through economies of scale
Savings through shared access
Ongoing evaluation and adoption of new
technologies
• Sharing of knowledge
• Easily implemented
Cost Sharing
Savings and Benefits
• One set of HW and SW serves entire
community
• Users access through Web Browser
• Shared operations and maintenance
• One set of HW/SW support agreements
• One system administrator
• If scale/need dictates, may have more than one
and provide 24/7 support
Economies of Scale
Savings and Benefits
• When purchasing data, size matters
• A quote from one data vendor:
• 11-30 users costs “X” dollars
• 31-300 users costs “2X” dollars
• 15 users pay 5 times more per-user than 150
users
Shared Access
Savings and Benefits
• Some data vendors require individual
licenses to log into their systems
• The OSIC can potentially arrange for
shared licenses
• Higher cost per license, but many fewer
licenses needed
• Technology can enforce sharing limits, so
vendors confident not they’re being abused
Access to New Technologies
Savings and Benefits
• OSIC has mandate to investigate new
tools and technologies for
• Search & retrieve
• Data mining
• Knowledge sharing
• If a tool is put into OSIC, can be
immediately available to all
• May be some benefits to entire community
using common toolset
Knowledge Sharing
Savings and Benefits
• If one individual finds a useful document or URL,
can be made available to entire community
• If users wish, they can make available to others
• Their specific queries
• Their areas of interest
• If OSIC used to “contract out” unclassified
research, then expertise of who to call for what
is centralized
Ease of Implementation
Savings and Benefits
• Subscribers don’t have to deal with
• Evaluations and processes associated with a
system purchase or development
• Staffing to support
• Licensing with SW or data vendors
• Training preparation or delivery
• Service can be available within days of
signing up for subscription
Costs and Risks
• Operating model
• Different organizations have different
requirements
• Security concerns
Operating Model
Costs and Risks
• The OSIC, as a contractor-owned user-subscription
model, has some difficulties
• Affordable fees depend on significant community participation
• 50-100 users likely a viable starting number
• Subscription could be per-seat or per-organization
• Problem: A number of orgs want to wait until “others go first”
•
Problem: This is a model new to Canadian Federal contracting
processes
• A government owned & operated model also has
difficulties
• Arriving at a fair way to distribute costs can be a problem
• Particularly if different organizations have very different data
acquisition or system utilization needs
• The contracting process to establish such a capability very long
Organization Requirements
Costs and Risks
• There may be issues if parts of the
community have substantially different
needs for such things as
• Security
• 24/7
• Solutions almost certainly exist, but how
would they be cost-shared?
Security Concerns
Costs and Risks
• Internet-based OSINT requires connectivity
• Some agencies air-gap
• May leave “footprints”
• All collection methods have a comparable problem
• Some believe their work so super-secret that the
merest hint of my interests is classified
• In many cases, this is an exaggeration
Questions?