Analysis of Complex Systems John Sherwood Period 2 Abstract My

Analysis of Complex Systems
John Sherwood
Period 2
My project is involved with using data mining
techniques on the internet in order to gather
enough information for the use for a genetic
algorithm in trend analysis of a complex system;
e.g. the stock market
The most fundamental element of my program is
creating a correlation between news about a
company and its stock and the price of the stock
itself. In order to do this, a huge amount of data
on both stock prices and news regarding
companies must be processed into a quantitative
format, and then extensively analyzed.
Expected Results
In this project, I expect to at the very least have a
very useful genetic algorithm, that given a list of
independant and dependant data, can generate
equations to create a tentative correlation. While
the extremely chaotic nature of the specific
application may prevent quantitative success in
this instance, I do expect to have success on
general terms.
Other's Work
Due to the very lucrative nature of a program that
could predict the stock market:
Many have tried
All have failed
Differs for each part of program
Data Mining Analysis
Determination of data parsing sequences to extract
information from HTML (Hypertext Markup
Quantitative tests of success
XML parser
Data classification
Trial and error tests
Evaluation algorithms
Discriminant generation
Several program segments:
Data mining algorithms
Price data logger
News parser
Data analysis algorithms
Command Shell
unifies elements of program
Graph Generator
Written in PHP to build PNG graphs of data
Program Tests
XML parser tests prove successful in parsing
properly formatted XML/XHTML, sufficient
success in parsing malformed XHTML
Stock logging working perfectly
Equation refinement algorithms work perfectly
on test data
Different program segments use different
Data mining algorithm
Discriminant Generation Algorithm
XML parsing algorithm
Equation Refinement algorithm
Data Mining
Based on XML parser to convert XHTML code
to programming objects
New algorithm allows for parsing of Google
Finance pages
Different algorithms required to parse different
websites for information
XML Parser
Two potential paradigms
Uses a set of flags to determine what action to take with
each character
Splits XML document into sets of tags and processes
each tag's child elements
Malformed XHTML forces extensive testing of
data mining for each new site
Huge variety in formatting of different websites
makes mining slow
Results and Conclusions
Actual data instead of test data may be fed into
regression algorithms in a matter of days after
Reuters and a few other select websites are
made parseable with new algorithms.