Text mining SEC Filings for Fraud Detection

Download Report

Transcript Text mining SEC Filings for Fraud Detection

Text Mining SEC Filings for Fraud
Detection
Fletcher Glancy
ISQS 7342
Research Issues
1. Can fraud be detected from SEC filings?
2. Can text mining provide a methodology for
detection of potential fraud?
3. If text mining can provide an indication of
potential fraud, which algorithm gives the
best performance?
12/2/2008
Fletcher Glancy
Brief Background
• Corporate governance fraud has been a major
concern, i.e., Enron, WorldCom, HealthSouth.
• Detection has been after many years of abuse.
• Most techniques involve ratio analysis.
• Churyk et al. used Context Analysis to detect
fraud in MDA of 10K filings.
12/2/2008
Fletcher Glancy
Potential Strengths of Text Mining
• TM can be automated.
• The results can be used for further data
mining.
• TM eliminates researcher bias that is
potentially present in Context Analysis.
12/2/2008
Fletcher Glancy
Potential Problems/Weakness
• There is no context in text mining, only
statistics.
• It is difficult to understand the relationships
with a document-term matrix.
• Unable to handle negatives or punctuation.
12/2/2008
Fletcher Glancy
Narrow the Focus - Negatives
• Antonyms – Word Opposites.
• Negatives – not good = bad.
• Interference by articles.
Not a good day.
• Interference by modifiers.
Not highly motivated.
12/2/2008
Fletcher Glancy
Possible Data Preparation Options
• Preprocessing to remove articles.
• Convert punctuation to text.
Replace ‘;’ with semicolon.
• Combine following noun with “not”.
Not highly motivated becomes
highly not_motivated.
• Create not_noun and replace with antonym.
not_dead is replaced with alive.
12/2/2008
Fletcher Glancy
Testing Data Preparation Options
• Select/Create text database.
– 10K Notes and MDA.
– Firms that have received AAER.
• Preprocess with each alternative individually
and cumulative.
• Create document text matrix and SVD.
12/2/2008
Fletcher Glancy
Testing Data Preparation Options
• Calculate variance of document set using SVD.
• Create logistic regression using set SVD and
calculate variance.
• Test for predictability using validation set.
12/2/2008
Fletcher Glancy
Questions?
Welcome to my potential
dissertation topic!
12/2/2008
Fletcher Glancy