PythonXMLSimulator
Download
Report
Transcript PythonXMLSimulator
SIMO Python/XML Simulator
Current situation 28/10/2005
SIMO Seminar 28.10.2005
Antti Mäkinen
Dept. of Forest Resource Management /
University of Helsinki
What can be calculated at the moment?
Development of different variables at stand level...
What can be calculated at the moment?
Development of different variables at stand level...
SP_1_Age_BA
140
120
100
BA
80
60
40
20
0
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
Development of different variables at stand level...
SP_1_Age_D_gM
40
35
D_gM
30
25
20
15
10
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
Development of different variables at stand level...
SP_1_Age_H_dom
35
30
H_dom
25
20
15
10
5
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
Development of different variables at stand level...
SP_1_Age_V
2500
2000
V
1500
1000
500
0
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
Development of different variables at stand level...
id:1 stratum:0
10
i_BA
8
6
4
2
0
40
60
80
100
Age
120
140
What can be calculated at the moment?
Diameter distributions and tree level attributes
id:100040543 year:2005 stratum:0
120
100
N
80
60
40
20
0
6
8
10
12
14
d
16
18
20
Just for comparison with J simulator...
Pine BA vs. Age ClT
Pine BA vs. Age CT
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
0
20
40
60
80
100
0
120
20
40
60
80
100
120
J simulator
J simulator
Python/XM L Simulator
Python/XM L Simulator
Pine BA vs. Age VT
Pine BA vs. Age MT
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
0
20
40
60
80
100
120
0
20
40
60
80
J simulator
J simulator
Python/XM L Simulator
Python/XM L Simulator
100
120
Just for comparison with J simulator...
Pine H_dom vs. Age MT
Pine V vs. Age MT
30
1000
900
800
25
700
600
500
400
300
20
15
10
200
100
0
5
0
0
20
40
60
80
100
120
0
20
40
60
J simulator
J simulator
Python/XM L Simulator
Python/XM L Simulator
80
100
120
What can be calculated at the moment?
Estimating forest variable development at both stand
level & tree level is possible at the moment (300+ models
implemented), but
Forestry operations not yet implemented in the simulator
→ ”real world” simulations not yet possible
Bucking models still not ready
Optimizing module still missing
How the simulation process works in SIMO?
XML Files
Reporter Module
IN: XML data
OUT: transformed
XML, graphs
IN: data, simulation control,
modelchains, model definitions
OUT: results
id:100040543 year:2005 stratum:0
120
100
N
80
60
SIMULATOR
40
20
0
6
8
10
12
14
16
18
20
d
IN: modelname, input variables
OUT: model result, warnings &
errors
MODEL LIBRARY
SIMULATION PROCESS
What is missing?
XML Files
Reporter Module
Validator Module
SIMULATOR
MODEL LIBRARY
MODEL LIBRARY
MODEL LIBRARY
MODEL LIBRARY
Optimizer Module
XML Files
Data XML
Simulation control XML
Model Chain XML
Model XML
Result XML
Model Library
Includes all models used in the simulator
Programmed with C language as a Dynamic Link
Library (DLL)
Models are C functions that are called from the
simulator (model definitions also in the Model.xml)
Users can add new models to the library or create
additional model libraries
Reports warnings and errors to the simulator
Risk level models not yet implemented
SIMULATOR
1. version of simulator programmed with C/C++
Later the programming language was changed to
Python, because of:
Simple and concise syntax → easier readability of
code and possibility of developing the simulator faster
http://www.python.org
Good combatibility with C language
Number of useful readymade open source tools for
variety of purposes
Code documentation is underway
SIMULATOR
Intakes simulation control instructions, model
chains, model definitions and data in XML format
Transforms the XML data from different files to
simulators own data structure (more efficient than
ElementTree data structure)
Processes the user defined model chains for each
computing unit in the data
Calls the model library whenever some value
needs to be calculated (Python/C interface ctypes)
Prints the resulting values into a result XML file
Reporting Module
Used for visualizing data & transforming the results
from XML format to other formats
Intakes data and processing instructions in XML
format
At the moment can plot different kinds of graphs of
given variables (matplotlib) toolset
XML transformations to be implemented later...
Missing modules
Optimizer module
• Finds the best alternative from the alternatives
generated by the simulator
• Possibly many alternative optimizing methods?
Validator module
• Validates the XML files with XSD (Schema) files
and by external rules
• Makes sure that the XML files are well-formed and
contain all necessary elements
Strengths of SIMO XML Simulator
Virtually any kind of model can be used in the
simulations and added to the model library
User can define the model chains freely for
different kinds of simulations
User can define correction/rectification factors for the
models, (eg. different factors for geographical areas)
Data levels are not confined to strict predifined
standard
Extensive warning and error reporting system (risk
control coming later...)
Model risk management –individual variables
Minimum and maximum limits of individual variables have been
defined
Documented in ModelXML
Limits have been coded into ModelLibrary -> throws warnings if
the Individual parameter values are out of bounds
How the minimum and maximum limits are defined?
Limits defined by author (caused by data, model shape, …)
Limits of modeling data
Model is tested with those limits using NFI-data as test data.
Does the model function properly if the Individual parameter
values are out of bounds?
For example: Basal area growth model (Vuokila & Väliaho) for
Scots pine on mineral soils
Model risk management –interaction
Interaction between
ba
(20, 32) not accepted
variables
Accepted
combinations of
varibles
(120, 5) not accepted
Solution alternatives:
Logit-model:
age
propability that the estimate is in acceptable area (at least
linear regression was not flexible enough)
Grid: area of combinations of variables is divided into cells.
Every cell has information is the estimate acceptable or not
Model risk management
Two levels
1. Individual parameter values out of bounds
2. All individual parameter values acceptable, but is the
specific combination of them acceptable?
Case 1: already in the simulator
Case 2: Suggestion
1. get the k nearest neighbours from the VMI data,
2. evaluate the model for the data point and the k
nearest neighbours.
3. If the difference for the model estimate between the
data point and the neighbours is too big, generate an
event of ”unacceptable” model estimate
Isn’t that procedure too heavy computationally?
Probably, not yet evaluated
But what about if we store the risk evaluation results
and use those primarily:
1. Is it safe to call ModelA with parameters (5, 6, 10)
when we accept risk level X?
2. Has the risk been evaluated with parameter
values (5,6,10) and risk level X before. If yes, get
the answer from a table of risk evaluations
3. If not, get k nearest neighbours for data point
(5,6,10), evaluate the model with (5,6,10) and k
neighbours
4. Store the risk evaluation result and the mean
model result for k neighbours for the data point
(5,6,10) and risk level X
200
200
50
100
IKA_B
150
150
IKA_B
100
50
10
20
30
PPA_KUORETON
40
50
10
20
30
PPA_KUORETON
40
50
Open questions:
When evaluating model result shall we compare it to:
values derived directly from the nearest VMI
permanent sample plots
OR
model estimates for the nearest VMI sample plots?
Software license for SIMO
Types of Open Source licenses
MIT & Co: “Do whatever you want”
LGPL: “Everything you do to the original
code must be open source, anything on top
of that can be closed”
GPL & Co: “Everything you do is open
source, …well almost”
GPL under the hood: "derivative work" or "mere
aggregation“? Derivative work must be open source, but
aggregation can be closed source
The case of MySQL
Double licensing: open source GPL, commercial
development with a commercial license that allows closed
source
General software architecture
Individual components that communicate over the network
Validator
Simulator – this is well underway
Optimiser
Reporter – simulation results to figures and other data
formats than XML, or different XML format etc.
Implications to licensing? What about if one of the
components uses a sub component that is published
under GPL?
Architecture continued
TCP/IP based communication
Security issues?
secured traffic (SSL, SSH)
inside firewall
Scalable