Analysis of Complex Systems John Sherwood Period 2 Abstract My

Download Report

Transcript Analysis of Complex Systems John Sherwood Period 2 Abstract My

Analysis of Complex Systems
John Sherwood
Period 2
Abstract
My project is involved with using data mining
techniques on the internet in order to gather
enough information for the use for a genetic
algorithm in trend analysis of a complex system;
e.g. the stock market
Scope
The most fundamental element of my program is
creating a correlation between news about a
company and its stock and the price of the stock
itself. In order to do this, a huge amount of data
on both stock prices and news regarding
companies must be processed into a quantitative
format, and then extensively analyzed.
Expected Results
In this project, I expect to at the very least have a
very useful genetic algorithm, that given a list of
independant and dependant data, can generate
equations to create a tentative correlation. While
the extremely chaotic nature of the specific
application may prevent quantitative success in
this instance, I do expect to have success on
general terms.
Other's Work
Due to the very lucrative nature of a program that
could predict the stock market:

Many have tried

All have failed
Procedures

Differs for each part of program

Data Mining Analysis



Determination of data parsing sequences to extract
information from HTML (Hypertext Markup
Language)
Quantitative tests of success

XML parser

Data classification
Trial and error tests

Evaluation algorithms

Discriminant generation
Design

Several program segments:


Data mining algorithms

Price data logger

News parser
Data analysis algorithms


Command Shell


Genetic
unifies elements of program
Graph Generator

Written in PHP to build PNG graphs of data
Program Tests



XML parser tests prove successful in parsing
properly formatted XML/XHTML, sufficient
success in parsing malformed XHTML
Stock logging working perfectly
Equation refinement algorithms work perfectly
on test data
Algorithms

Different program segments use different
algorithms

Data mining algorithm

Discriminant Generation Algorithm

XML parsing algorithm

Equation Refinement algorithm
Data Mining



Based on XML parser to convert XHTML code
to programming objects
New algorithm allows for parsing of Google
Finance pages
Different algorithms required to parse different
websites for information
XML Parser

Two potential paradigms

Iterative


Uses a set of flags to determine what action to take with
each character
Recursive

Splits XML document into sets of tags and processes
each tag's child elements
Problems


Malformed XHTML forces extensive testing of
data mining for each new site
Huge variety in formatting of different websites
makes mining slow
Results and Conclusions

Actual data instead of test data may be fed into
regression algorithms in a matter of days after
Reuters and a few other select websites are
made parseable with new algorithms.