Using Bayesian Networks for Water Quality Prediction in Sydney

Download Report

Transcript Using Bayesian Networks for Water Quality Prediction in Sydney

Using Bayesian Networks to
Predict Water Quality in Sydney
Harbour
Final Presentation
Name: Shannon Watson
Supervisors: Ann Nicholson &
Charles Twardy
Introduction






The application domain
Bayesian Networks
Knowledge Engineering
Weka
Results
Conclusion and Future Work
The Domain - Sydney Harbour




Water Quality for
recreational use
Beachwatch /
Harbourwatch Programs
Bacteria samples used
as pollution indicators
Many variables
influencing Bacterial
levels – rainfall, tide, wind,
sunlight temperature, ph etc
Past studies

Hose et al. used multi dimension scaling model of
Sydney harbour – low predictive accuracy
- unable to handle the noisy bacteria samples


Other models developed by the USEPA to model
estuaries are:
QUAL2E – Steady-state receiving water model
WASP – Time Varying dispersion model
EFDC – 3D hydrodynamic model
EPA in Sydney want a model applying the causal
knowledge of the domain
Bayesian Networks






Directed acyclic graphs
Nodes = random variable/uncertain quantities
Links/arcs = casual relationships between variables
Strengths of the links are quantified by conditional
probabilities
Uses Bayes theorem to resolve uncertainties
Graphical structure allows non-technical people to
understand a technical model.
Bayesian Networks – An Example
Bayesian Networks – An example
Knowledge Engineering



Used to keep structure of large projects in software
engineering discipline
Two major implementations
-Spiral model/Waterfall Model
Spiral Model can be used in the application of
Bayesian Networks to keep in control of the Network
as it progresses from a small to large Bayesian
network
Problems Encountered



Getting the data into a usable format
-solved by learning how to use tools such as sed
and awk, perl.
Understanding the domain – what amount of rainfall
would cause high levels of bacteria?
Through Netica was unable to determine a clear
breaks in the rainfall data
Weka




The Weka package is a Java implementation of a
number of machine Learners.
Implements Industry standard is C4.5
It was hoped that running the machine learners may
be able to help understand data
Also run other machine Learners
Naïve Bayes
AODE
KeoghTan
Results - Entc.

EPA’s Current Model vs. One of My Models
Naïve Bayes C4.5 EPA Current Model
Davidson 76(2.5) 75(3.5)
73(3.4)
Parsley 91(0.6) 93(0.7)
91(-0.2)
Woolwick 74(4) 73(5.1)
70(5.0)
Woodford 72(2.2) 69(3.4)
69(3.4)
My model
73(4.0)
91(-0.2)
70(4.9)
69(4)
Predicting Water Quality





Knowledge of the domain has been greatly advance
through use of Netica to Visualise models
Machine Learners quickly respond with Experts own
ideas
that the data is noisy
Domain can’t easily be modeled
No way to easily state what levels of rainfall
determine when there will be high bacterial readings
Conclusion





Use Bayesian networks
Create a model of Sydney harbour
Aim to predict the water quality for EPA
Use Spiral Model
Machine Learners
Further Work

Evaluate the models
Add more variables to models and revaluate
Recommend models to Sydeny EPA

Questions

