Analysis of Complex Systems John Sherwood Period 2 Abstract My

Download Report

Transcript Analysis of Complex Systems John Sherwood Period 2 Abstract My

Analysis of Complex Systems
John Sherwood
Period 2
Abstract
My project is involved with using data mining
techniques on the internet in order to gather
enough information for the use for a genetic
algorithm in trend analysis of a complex system;
e.g. the stock market
Scope
The most fundamental element of my program is
creating a correlation between news about a
company and its stock and the price of the stock
itself. In order to do this, a huge amount of data
on both stock prices and news regarding
companies must be processed into a quantitative
format, and then extensively analyzed.
Expected Results
In this project, I expect to at the very least have a
very useful genetic algorithm, that given a list of
independant and dependant data, can generate
equations to create a tentative correlation. While
the extremely chaotic nature of the specific
application may prevent quantitative success in
this instance, I do expect to have success on
general terms.
Other's Work
Due to the very lucrative nature of a program that
could predict the stock market:

Many have tried

All have failed
Procedures

Differs for each part of program

Data Mining Analysis



Determination of data parsing sequences to extract
information from HTML (Hypertext Markup
Language)
Quantitative tests of success

XML parser

Data classification
Trial and error tests

Evaluation algorithms

Discriminant generation
Design

Several program segments:


Data mining algorithms

Price data logger

News parser
Data analysis algorithms

Heuristic Generator

Equation Regression


Command Shell


Genetic
unifies elements of program
Graph Generator

Written in PHP to build PNG graphs of data
Program Tests



XML parser tests prove successful in parsing
properly formatted XML/XHTML, sufficient
success in parsing malformed XHTML
Stock price logging working perfectly
Generalized equations work but are semitimeframe specific
Algorithms

Different program segments use different
algorithms

Data mining algorithm

Discriminant (Heuristic) Generation Algorithm

XML parsing algorithm

Equation Refinement algorithm
Data Mining



Based on XML parser to convert XHTML code
to programming objects
New algorithm allows for parsing of Google
Finance pages – following links to other sites for
more data
Different algorithms required to parse different
websites for information
XML Parser

Two potential paradigms

Iterative


Uses a set of flags to determine what action to take with
each character
Recursive

Splits XML document into sets of tags and processes
each tag's child elements
Problems


Malformed XHTML forces extensive testing of
data mining for each new site
Huge variety in formatting of different websites
makes mining problematic to make sure that
only news data is mined
Results and Conclusions

The equations generated by my equation refiner
are accurate in the timeframe they are
generated for, then become less accurate as
time passes (time in the past or future relative
to the generation timeframe), implying that the
effect of news information on stock prices is
non-constant and relative based on your current
timeframe