Transcript View slides
FIRST
European research for web information extraction
and analysis for supporting financial decision making
EFMA Customer Week - April 2013
Tomás Pariente Lobo – Atos Spain
Motivation
Vision
Innovation
Tools
Why FIRST? - Motivations
The most reliable data sources today…
…also have their weakness!
They do not consider unstructured data, rumors, market
sentiments, etc.
3
Why FIRST? - Motivations
Example: Apple iPhone 1 Announcement on 2007-01-09
100
95
90
85
80
17/01/2007
16/01/2007
15/01/2007
14/01/2007
13/01/2007
12/01/2007
11/01/2007
10/01/2007
09/01/2007
08/01/2007
07/01/2007
06/01/2007
05/01/2007
04/01/2007
03/01/2007
75
Stock prices were skyrocketing after the announcement.
However, the announcement could be sensed before…
…
4
2006-05-12:
Apple said to be
working on cobranded handset
2006-07-29:
Peter
Oppenheimer:
‚We‘re not
sitting around
doing nothing‘
...
2006-12-15:
iPhone may
include JAJAH
voip software
2007-01-09:
iPhone
Announcement
Why FIRST? - Motivations
Example: Market surveillance via FIRST (the Google news case)
September 2008: Google news announced “United Airlines bankruptcy”.
Within 12 minutes stock price decreased 75% wiped out US $ 1bn.
The “news” was actually 6 years old…
Plausibility checking will help in identifying hoaxes: consistence with regulatory news
and other sources.
5
Why FIRST? – Motivations
A growing universe of unstructured data
… how to separate the wheat
from the chaff ?
6
Motivation
Vision
Innovation
Tools
FIRST Project
European-funded research project
Project facts
Running from October 2010 until
September 2013
9 partners
More than 30 people
Preliminary results available
More to come...
Stay tuned (http://project-first.eu/)
8
Who is behind FIRST?
Industrial partners
Academic/Research
SMEs
FIRST Vision
Vision
is to make available the relevant information
of the entire financial information space
(including unreliable, unstructured, sentiment sources)
to the decision maker in near-real time
in an automated way
10
FIRST Vision
Financial Resources
Structured
AUTOMATION
Acquisition
Unstructured
Blog, analysis, bulletin boards…
Unreliable, poor quality, noisy…
11
Processing
Analysis
Decision support
Automated data processing
Overall Goal: Mixing
structured information and
unstructured web data in
specific decision making
processes
Four steps in the macro-
process of converting data
into information are tackled,
in our solutions tailored to
the financial services
market:
Data stream acquisition
1. Large-scale data
• Usable by industry partners:
• Real-time stream-based methods
2. Information extraction
• Near-duplicates and boilerplate removal
• Language detection, language extract
• Sentiment extraction + classification
3. Integration
• Ontology + Knowledge base
• Lightweight component integration
Real-time processing
Sentiment building
Decision Support System
4. Decision support
• Machine learning + qualitative modeling
• Visualization
Sentiment from web data streams
Sentiment is extracted from data streams and correlated with events
Sentiment in Financial Services?
Sentiment
cross-over
In Sept 2011, the sentiment turns from a
long time of positive values to negative.
A big plunge in the price happens shortly
after, accompanied by a series of
negative events (lost deals etc)
Sentiment cross-over
happens before price
plunge
Motivation
Vision
Innovation
Tools
Mining the Web for financial texts
Data Acquisition pipeline: Web mining
Natural Language preprocessing
and entity extraction
Streaming
Cleaning
Financial terms,
Companies,
Intruments …
Data acquisition after one year
Some numbers
176 Web sites
2,671 RSS sources
~40,000 documents per day
>10.000.000 documents by end of 2012
o And growing
Essential for future evaluation and analysis
17
Analysing sentiments in Web texts
The Analytical Pipeline: Identify, extract, classify, aggregate
Document
with
basic
annotations
SENTIMENT
CLASSIFICATION
per object and feature
Document with
sentiment
sentences
SENTIMENT
AGGREGATION
per object and feature
Aggregated
sentiments
Indicators
Object
Positive sentiment
Sentiment
Sentences
18
Supporting the decision making process
The Decision Support techniques: Analysis and visualization
FIRST
Acquisition &
Analytical
Pipelines
Machine
Learning
Techniques
Qualitative
Modeling
Knowledge
Base
Visualization
Techniques
19
Outputs:
Forecasting
Models
Forecasts of
volatility or returns,
Alert on pump and
dump,
Reputation change
of a counterpart
Signals,
Charts,
Topic Spaces,
Topic Trends,
Reports
…
Glassbox model
Sentiment
Drill down
Objects
Features
20
Document
sentences
Sentiment analysis & decision
making
The integrated model of FIRST and its innovations
Main areas of research
•Sentiment analysis
•DSS models
•Stream visualization
•Scaling strategy
Early adopters
•Slovenian presidential
elections
•GAMA Perception
Analytics
In the following slides we will rapidly check results from incorporating sentiments
in retail brokerage, investment management and reputational risk scenarios
Motivation
Vision
Innovation
Tools
The three FIRST use cases &
their relevance for the industry
Market Surveillance
Capital markets compliance can be automated today using structured data, but
the automation does not take unstructured data into account
FIRST will
make use of large volumes of unstructured data into financial compliance;
develop automated techniques to better detect market abuse/insider trading..
Reputational Risk Management
No off-the-shelf solutions or methodologies for reputational risk management.
FIRST will
provide a sustainable tool for reputational risk monitoring;
contribute to break new ground in this field of dramatically high impact in FSI.
Retail Brokerage
23
Today, mainly based on quantitative analysis and key figures.
FIRST will
use unstructured data to leverage both information for private investors and
sophisticated tools for professional users.
The three usecases in the words of
the FIRST UC-Owners
UC#1 – Market Surveillance
“The development of surveillance scenarios based on unstructured information will allow the
compliance offices to better investigate on unusual and suspicious trading activities and to
better understand trends and patterns” – Stefan Queck, Business Dev. Manager at NEXT.
“Especially in time of financial crises, new regulatory requirements and reputation loss risks,
the financial industry is interested in new methods and approaches to detect abuse trading
behaviour”– Wolfgang Fabisch, CEO at NEXT.
UC#2 – Reputational Risk Management
“From the early prototype release, we are looking forward to utilising in a real-life
environment the FIRST solution” – Maria Costante, Responsible for reputational risk
modelling and Pillar 3 at Gruppo Montepaschi.
“We already discussed the tool we are setting up in European contexts, and we are looking
forward to presenting the first results, already in 1H/2012” – Giorgio Aprile, Head of
Reputational and Operational Risks at Gruppo Montepaschi.
UC#3 – Retail Brokerage
“When presenting the usecase to potential customers, they showed interest in this kind of
data and the resulting tools” – Michael Diefenthäler, Director of Product Mgmt at IDMS.
“We are looking forward to present the FIRST results to a variety of customers.” – Peter
Heister, Head of Sales EMEA at IDMS.
Reputational risk
….. Need for integrating online unstructured data analysis with the current analysis
on financial structured data
Query
• on-demand
• routinary
Sentiment
analysis
Ontology
IE
Reputation
cockpit
Reputational Risk
Index (RI) Model
Application
scenarios
Unstructured
sources
Customer an d
product data
(internal sources)
Performance
Mismatching
Volumes
Nr. Customers
Risk reporting:
• reputational
trends for each
counterpart
• events/topic
• data sources
drill-downs
• …
What-if
scenarios:
• events
• probabilityweighted
risk
scenarios
•…
Structured
sources
Goal:
to measure and to report, in quasi-real time, on reputational risk, using internal as well external data
sources, to be integrated into a single reputation engine and application scenario
Retail brokerage @ work
Sentiments: Leverage of the investment process by assessment of unstructured information
1.
Unburden the actor of reviewing various sources repeatedly by automation of this task
2.
Provide different levels of sentiments, e.g. for single instruments and sectors
3.
Support individual decision making by incorporating sentiments
Market surveillance @ work
Typically thinly-traded stocks
Blog A
Blog B
2) Disseminating inaccurate
Twitter
or misleading information
Blog C
p
4) Selling
On artificial price level
t
• Identification and classification of unstructured information
• Quite understandable generation of alerts
• Functionalitites to handle alerts
• Comparison of market, institute specific and unstructured information
Decision support in evaluation of suspicious constellations
Market surveillance @ work
Structured Information
Market data
Instrument
Reference data
Ad-hoc news
Transaction
data
Employee data
Order data
Benefits for the market
Broadend approach of
detecting suspicious trading
behaviour
Early recognition of trends
and patterns
Decision support in
investigation and escalation
Sentiment Analysis
Scenario Analysis
Unstructured Information
Blogs
Discussion
Forums
„News“
Social
Networks
Analytic Models
Visualisation
Real-life implementation @ B-NEXT, Germany.
Contact: [email protected]
29
Stay tuned (http://project-first.eu/)
Acknowledgement
The research leading to these results has received funding from the
European Community's Seventh Framework Programme
(FP7/2007-2013) under grant agreement n°257928.
THANKS