Analysis of Complex Systems John Sherwood Period 2 Abstract My
Download
Report
Transcript Analysis of Complex Systems John Sherwood Period 2 Abstract My
Analysis of Complex Systems
John Sherwood
Period 2
Abstract
My project is involved with using data mining
techniques on the internet in order to gather
enough information for the use for a genetic
algorithm in trend analysis of a complex system;
e.g. the stock market
Scope
The most fundamental element of my program is
creating a correlation between news about a
company and its stock and the price of the stock
itself. In order to do this, a huge amount of data
on both stock prices and news regarding
companies must be processed into a quantitative
format, and then extensively analyzed.
Expected Results
In this project, I expect to at the very least have a
very useful genetic algorithm, that given a list of
independant and dependant data, can generate
equations to create a tentative correlation. While
the extremely chaotic nature of the specific
application may prevent quantitative success in
this instance, I do expect to have success on
general terms.
Other's Work
Due to the very lucrative nature of a program that
could predict the stock market:
Many have tried
All have failed
Procedures
Differs for each part of program
Data Mining Analysis
Determination of data parsing sequences to extract
information from HTML (Hypertext Markup
Language)
Quantitative tests of success
XML parser
Data classification
Trial and error tests
Evaluation algorithms
Discriminant generation
Design
Several program segments:
Data mining algorithms
Price data logger
News parser
Data analysis algorithms
Heuristic Generator
Equation Regression
Command Shell
Genetic
unifies elements of program
Graph Generator
Written in PHP to build PNG graphs of data
Program Tests
XML parser tests prove successful in parsing
properly formatted XML/XHTML, sufficient
success in parsing malformed XHTML
Stock price logging working perfectly
Generalized equations work but are semitimeframe specific
Algorithms
Different program segments use different
algorithms
Data mining algorithm
Discriminant (Heuristic) Generation Algorithm
XML parsing algorithm
Equation Refinement algorithm
Data Mining
Based on XML parser to convert XHTML code
to programming objects
New algorithm allows for parsing of Google
Finance pages – following links to other sites for
more data
Different algorithms required to parse different
websites for information
XML Parser
Two potential paradigms
Iterative
Uses a set of flags to determine what action to take with
each character
Recursive
Splits XML document into sets of tags and processes
each tag's child elements
Problems
Malformed XHTML forces extensive testing of
data mining for each new site
Huge variety in formatting of different websites
makes mining problematic to make sure that
only news data is mined
Results and Conclusions
The equations generated by my equation refiner
are accurate in the timeframe they are
generated for, then become less accurate as
time passes (time in the past or future relative
to the generation timeframe), implying that the
effect of news information on stock prices is
non-constant and relative based on your current
timeframe