PythonXMLSimulator

Download Report

Transcript PythonXMLSimulator

SIMO Python/XML Simulator
Current situation 28/10/2005
SIMO Seminar 28.10.2005
Antti Mäkinen
Dept. of Forest Resource Management /
University of Helsinki
What can be calculated at the moment?
 Development of different variables at stand level...
What can be calculated at the moment?
 Development of different variables at stand level...
SP_1_Age_BA
140
120
100
BA
80
60
40
20
0
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
 Development of different variables at stand level...
SP_1_Age_D_gM
40
35
D_gM
30
25
20
15
10
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
 Development of different variables at stand level...
SP_1_Age_H_dom
35
30
H_dom
25
20
15
10
5
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
 Development of different variables at stand level...
SP_1_Age_V
2500
2000
V
1500
1000
500
0
10
20
30
40
50
60
Age
70
80
90
100
What can be calculated at the moment?
 Development of different variables at stand level...
id:1 stratum:0
10
i_BA
8
6
4
2
0
40
60
80
100
Age
120
140
What can be calculated at the moment?
 Diameter distributions and tree level attributes
id:100040543 year:2005 stratum:0
120
100
N
80
60
40
20
0
6
8
10
12
14
d
16
18
20
Just for comparison with J simulator...
Pine BA vs. Age ClT
Pine BA vs. Age CT
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
0
20
40
60
80
100
0
120
20
40
60
80
100
120
J simulator
J simulator
Python/XM L Simulator
Python/XM L Simulator
Pine BA vs. Age VT
Pine BA vs. Age MT
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
0
20
40
60
80
100
120
0
20
40
60
80
J simulator
J simulator
Python/XM L Simulator
Python/XM L Simulator
100
120
Just for comparison with J simulator...
Pine H_dom vs. Age MT
Pine V vs. Age MT
30
1000
900
800
25
700
600
500
400
300
20
15
10
200
100
0
5
0
0
20
40
60
80
100
120
0
20
40
60
J simulator
J simulator
Python/XM L Simulator
Python/XM L Simulator
80
100
120
What can be calculated at the moment?
Estimating forest variable development at both stand
level & tree level is possible at the moment (300+ models
implemented), but
 Forestry operations not yet implemented in the simulator
→ ”real world” simulations not yet possible
 Bucking models still not ready
 Optimizing module still missing
How the simulation process works in SIMO?
XML Files
Reporter Module
IN: XML data
OUT: transformed
XML, graphs
IN: data, simulation control,
modelchains, model definitions
OUT: results
id:100040543 year:2005 stratum:0
120
100
N
80
60
SIMULATOR
40
20
0
6
8
10
12
14
16
18
20
d
IN: modelname, input variables
OUT: model result, warnings &
errors
MODEL LIBRARY
SIMULATION PROCESS
What is missing?
XML Files
Reporter Module
Validator Module
SIMULATOR
MODEL LIBRARY
MODEL LIBRARY
MODEL LIBRARY
MODEL LIBRARY
Optimizer Module
XML Files
 Data XML
 Simulation control XML
 Model Chain XML
 Model XML
 Result XML
Model Library
 Includes all models used in the simulator
 Programmed with C language as a Dynamic Link
Library (DLL)
 Models are C functions that are called from the
simulator (model definitions also in the Model.xml)
 Users can add new models to the library or create
additional model libraries
 Reports warnings and errors to the simulator
 Risk level models not yet implemented
SIMULATOR
 1. version of simulator programmed with C/C++
 Later the programming language was changed to
Python, because of:
 Simple and concise syntax → easier readability of
code and possibility of developing the simulator faster
http://www.python.org
 Good combatibility with C language
 Number of useful readymade open source tools for
variety of purposes
 Code documentation is underway
SIMULATOR
 Intakes simulation control instructions, model
chains, model definitions and data in XML format
 Transforms the XML data from different files to
simulators own data structure (more efficient than
ElementTree data structure)
 Processes the user defined model chains for each
computing unit in the data
 Calls the model library whenever some value
needs to be calculated (Python/C interface ctypes)
 Prints the resulting values into a result XML file
Reporting Module
 Used for visualizing data & transforming the results
from XML format to other formats
 Intakes data and processing instructions in XML
format
 At the moment can plot different kinds of graphs of
given variables (matplotlib) toolset
 XML transformations to be implemented later...
Missing modules
 Optimizer module
• Finds the best alternative from the alternatives
generated by the simulator
• Possibly many alternative optimizing methods?
 Validator module
• Validates the XML files with XSD (Schema) files
and by external rules
• Makes sure that the XML files are well-formed and
contain all necessary elements
Strengths of SIMO XML Simulator
 Virtually any kind of model can be used in the
simulations and added to the model library
 User can define the model chains freely for
different kinds of simulations
 User can define correction/rectification factors for the
models, (eg. different factors for geographical areas)
 Data levels are not confined to strict predifined
standard
 Extensive warning and error reporting system (risk
control coming later...)
Model risk management –individual variables
 Minimum and maximum limits of individual variables have been




defined
Documented in ModelXML
Limits have been coded into ModelLibrary -> throws warnings if
the Individual parameter values are out of bounds
How the minimum and maximum limits are defined?
 Limits defined by author (caused by data, model shape, …)
 Limits of modeling data
 Model is tested with those limits using NFI-data as test data.
Does the model function properly if the Individual parameter
values are out of bounds?
For example: Basal area growth model (Vuokila & Väliaho) for
Scots pine on mineral soils
Model risk management –interaction
 Interaction between
ba
(20, 32) not accepted
variables
Accepted
combinations of
varibles
(120, 5) not accepted
 Solution alternatives:
 Logit-model:
age
propability that the estimate is in acceptable area (at least
linear regression was not flexible enough)
 Grid: area of combinations of variables is divided into cells.
Every cell has information is the estimate acceptable or not
Model risk management
 Two levels
1. Individual parameter values out of bounds
2. All individual parameter values acceptable, but is the
specific combination of them acceptable?
 Case 1: already in the simulator
 Case 2: Suggestion
1. get the k nearest neighbours from the VMI data,
2. evaluate the model for the data point and the k
nearest neighbours.
3. If the difference for the model estimate between the
data point and the neighbours is too big, generate an
event of ”unacceptable” model estimate
 Isn’t that procedure too heavy computationally?
 Probably, not yet evaluated
 But what about if we store the risk evaluation results
and use those primarily:
1. Is it safe to call ModelA with parameters (5, 6, 10)
when we accept risk level X?
2. Has the risk been evaluated with parameter
values (5,6,10) and risk level X before. If yes, get
the answer from a table of risk evaluations
3. If not, get k nearest neighbours for data point
(5,6,10), evaluate the model with (5,6,10) and k
neighbours
4. Store the risk evaluation result and the mean
model result for k neighbours for the data point
(5,6,10) and risk level X
200
200
50
100
IKA_B
150
150
IKA_B
100
50
10
20
30
PPA_KUORETON
40
50
10
20
30
PPA_KUORETON
40
50
 Open questions:
When evaluating model result shall we compare it to:
values derived directly from the nearest VMI
permanent sample plots
OR
model estimates for the nearest VMI sample plots?
Software license for SIMO
 Types of Open Source licenses
MIT & Co: “Do whatever you want”
LGPL: “Everything you do to the original
code must be open source, anything on top
of that can be closed”
GPL & Co: “Everything you do is open
source, …well almost”
 GPL under the hood: "derivative work" or "mere
aggregation“? Derivative work must be open source, but
aggregation can be closed source
 The case of MySQL
Double licensing: open source GPL, commercial
development with a commercial license that allows closed
source
General software architecture
 Individual components that communicate over the network
Validator
Simulator – this is well underway
Optimiser
Reporter – simulation results to figures and other data
formats than XML, or different XML format etc.
 Implications to licensing? What about if one of the
components uses a sub component that is published
under GPL?
Architecture continued
 TCP/IP based communication
Security issues?
secured traffic (SSL, SSH)
inside firewall
 Scalable