Transcript Document

Health GIS 2008,Bangkok,Thailand
Data Warehousing
For JalaSRI
To Help Transition To Better Governance
And Data Empowerment
JalaSRI
Watershed Surveillance & Research Institute
Jalgaon, India
Vinay Dharmadhikari & Sanjay Pawde
1
Outline
1. Background
2. Motivation
3. Scope
4. Data Storage And Processing Strategies
5. Data warehousing
6. Design Strategy And Sequence Of Steps
7. Data Warehouse Configuration
8. Data Warehouse Architecture Review & Design
9. Illustrative Data Sources In Context Of JalaSRI
10. Illustrative Data Requirements
11. The Path Forward..!
2
Background
 There is a need for making available,to the decision
makers,planners,researchers and public institutions,
the necessary data, and data handling tools and techniques.
 The ultimate goal is to develop an integrated system for data
sharing, data accessing and data use, for solving locale- specific
problems.
 Geo-Informatics as conceived here will address all the vital
elements viz. geographic measurements, geo-accounting, spatial
analysis as well as integrated spatio-temporal decision-making.
3
Motivation
The data management practices at district level are not yet fully
geared to collect and the information needed.
 The conventional methods of data collection/ collation, storage are
not amenable for easy quick updation , retrieval and holistic
analysis.
 District level data management requires an integrated approach
data analysis tools and a large matrix of sectoral data, in digital
format.
4
Motivation (contd…)
 There is a critical need of data empowerment for people,
communities and Institutions of self-governance for enabling
informed Decision-making.
 Identification of subject relevant information/data for JalaSRI.
 Identification of relevant valid data sources and requirements of
historical and real time data with consistent formats to reduce
data redundancy.
5
Scope
 In this initiative, subject relevant requirements are being
progressively identified, recorded and refined by the JalaSRI team.
 Based on the information and the data analysis, the various JalaSRI
working groups are in process of developing appropriate
recommendations.
 Socio Economic impact, and on ground evaluation are being loaded
into the data warehouse
6
Types Of Data Encountered @JalaSRI
• Tabular
(Ex: Transaction data)
– Relational
– Multi-dimensional
• Spatial
(Ex: Remote sensing data)
• Temporal
(Ex: Log information)
– Streaming
(Ex: multimedia, network traffic)
– Spatio-temporal (Ex: GIS)
• Tree
(Ex: XML data)
• Graphs
• Sequence
• Text, Multimedia …
7
Evolving Spatial & Non-Spatial Databases At JalaSRI
Forest &
Biodiversity
Data
Weather
Data
Watershed
Data
GIS
Data
Agriculture
Data
Spatial & Non Spatial
Data Warehouse
Health
Data
Poverty
&
Unemployment
Data
Socio-economic
Data
Financial
Data
8
Data Storage And Processing Strategies
 Design to ensure adequate storage of data and efficient
transaction- processing environment.
 The main concern in JalaSRI database (DB) system is to
ensure concurrent access and recovery techniques that guarantees
data consistency.
 On-Line Application Processing Systems designed to
manage high number of concurrent transactions and offer
the functionality of on-line interactions.
9
Data Storage And Processing Strategies (contd..)
 Information System Design for multiple views,snapshots and
manipulations of data.
 Separate logically the Information System and Operational
Database System.
 Ultimate goal is to the evolve to a Decision Support System (DSS).
10
Why Data Warehousing ?
Operational decision making
•
•
•
•
Significance of change / difference ?
Trends ?
Temporal pattern ?
Spatial variation ?
Data
Warehouse
Statistical
summarization
Executive decision / policy
making
Charts /
graphs
Maps
• Relationships with demographics ?
• Relationships with social fabric ?
• Relationships with neighborhood
characteristics ?
• Relationships with physical environment ?
11
JalaSRI- Data Warehousing
 Data Warehouse (DW) at JalaSRI has been opted as the core
technology for DSS to validate and help improve the local wisdom.
 Design of DW with the capability of interface with modern
geo-informatics tools and software.
 Data warehouse is designed as 'reliable source' for such scientific
data.
 At JalaSRI, DW is perceived and designed as critical software
infrastructure which- collect, clean,integrate and organize the data.
12
JalaSRI- Data Warehousing (contd..)
 JalaSRI has started building the DW as an overall strategy and
continuously evolving and refining process.
 The Data Warehouse environment design will enable JalaSRI with
capabilities of –
trend identification,
forecasting,
summarization of significant data
competitive analysis,
and targeted market research.
13
JalaSRI Data Warehouse Services
Metacontent
maintenance
Version integration
Topology integration
Visualization
& reporting
Schema transform
Load & QA
Best path
Feature extraction
Generalization
Spatial
Warehouse
Data Server
14
Design Strategy Adopted
 Accurately identify the information that must be contained in the
Data Warehouse.
 Identify , prioritize and manage the scope of the subject areas to be
included in the Data Warehouse.
 Design for scaleable DW architecture.
 Identify and select the hardware -software - middleware components
 Design DW to Extract, cleanse, aggregate, transform and validate
the data to ensure accuracy and consistency.
 Define the correct level of summarization to support decisionmaking.
15
Design Strategy Adopted (contd..)
 Provide user-friendly, powerful desktop tools.
 Educate the user and community.
 Establish the processes for maintaining, enhancing, and ensuring the
ongoing evolution and applicability of the Warehouse.
 Establish a Data Warehouse Help Desk.
16
Design Strategy And Sequence Of Steps (contd..)
 Design of the Data Warehouse is around the major subject areas
of the JalaSRI.
 The data within the Data Warehouse design is integrated.
 All data in Data Warehouse is being validated and ensured to be
accurate and time consistent.
17
JalaSRI Data Warehouse Configuration
A Data Warehouse design time configuration, also known as the logical
architecture, includes the following components:
 One Enterprise Data Store (EDS) - a central repository, which
supplies atomic (detail level) integrated information to the whole
organization.
 One Operational Data Store - a "snapshot" of a moment in
time's enterprise-wide data
 One Data Mart - summarized subsets of the enterprise's data specific
to a functional area or department,geographical region, or time period
 One Metadata Store - catalogue(s) of reference information
about the primary data. Metadata is divided into two categories:
information for technical use, and information for end-users.
18
Logical View Multi-tier JalaSRI Data Warehouse
GIS
& applications
DB server
Integration Tools
Tools for modeling,
cleaning, integrating
and loading data.
File server
Access Tools
Tools for query, analysis
and reporting.
(Web-based preferred)
Appl. Server
ORACLE 10 i + with Spatial
enhancements
Tier 2
Application
Servers
File manager
Tier 1
Data files
Data Management Tools
Application
Environments
Data Access
Protocols &
APIs
Network Interface APIs - (OGDI, OGC & CGI)
Meta-Data Management (Repository)
Tier 3
Data Management
&
Data Server
Environment
19
JalaSRI DW Architecture Review & Design
 The logical architecture includes –
a central Enterprise Data Store,
an Operational Data Store,
one Data Marts per subject area,
and one Metadata store.
 After the logical configuration,
Data Architecture,
Application Architecture ,
Technical Architecture and
Support Architecture
is to be defined and designed to physically implement it.
20
DW Architecture Review & Design (contd..)
 Conduction of Gap analysis.
 The Data Architecture to define the quality and management
standards for data and metadata.
 The Application Architecture to control the movement of data from
source to user.
21
DW Architecture Review & Design (contd..)
 The Technical Architecture to provide the underlying computing
infrastructure that will enable the data and application architectures.
 The Support Architecture will include the software components for
performance management.
 Architecture Review and Design for development and refinement of
the overall Data Warehouse.
22
9. Illustrative Data Sources In Context Of JalaSRI
S1 -AISLUS- All India Soil And Land Use Survey, M/O Agriculture
S2 - NBSSLUP - National Bureau Of Soil Survey And Land Use
Planning, Indian Council of Agriculture Research
S3- NNRMS- National Natural Resources Management System,
and SRSAC- State Remote Sensing Application Center, D/O Space
S4- Population Census /NSS
S5- Agriculture Census
S6- Animal Husbandry Census
S7- BPL Census
S8- Land Records
S9- Net Area Sown , crop-wise
S10- Flood Control Agencies
S11- Meteorological Data
S12- Open domain Analysis of Paper on Watershed Surveillance 23and
Inventions
10. Illustrative Data Sources In Context Of JalaSRI
S13- Agriculture Research Outputs
S14- Irrigation Department Records
S15- Fertilizer Company Soil Tests
S16- Seed Companies Data
S17- Employment Exchanges
S18- Industry Associations
S19- Vocational Education Surveys
S20- Department Of Forest Data
S21- Bio Diversity Registers From PRIs
S22- Ecological NGOs (WWF etc)
S23- Department Of CO-OP / Registrar
S24- Rural Banks/ Credit Societies
S25- TIFR , TISS,Data-bases
S26- Labor Bureau
S27- Factory Statistics from ASI
24
10. Identified Data Requirements For JalaSRI
R1-Latest maps of various scale
R2-Satellite imaginaries of various seasonal times
R3-Land use and soil Maps
R4-Employment seekers
R5- District irrigation and water needs
R6- Crop information, new varieties ,seed/fertilizers,pest management
R7- Land allotment to landless
R8- Credit delivery
R9- Disease Incidences
R10- Water quality
R11- Sanitation data
25
10. Identified Data Requirements For JalaSRI
R12- No of wells with locations and water levels
R13- Fertilizers usage data
R14- Crop area sown
R15- Climate records
R16- Water table depths
R17- Acreage under HYV
R18- Child labour/bonded labour
R19- Species availability / abundance /scarcity
R20- Web link to relevant Data Warehouses like
PASDA,EPA,USGS etc
26
11. Future Direction
 The field of spatio-temporal data warehouse is new, it is still not very
well exploited, and it needs to integrate the knowledge from three
different research topics: data warehouses, spatial databases, and
temporal databases.
 While JalaSRI researchers will have free access to all raw data and
analyses, other collaborators will be given access, only through a
designated JalaSRI interface, and a policy on attribution and feeding
back use/publication of material using our data, will be enunciated.
Others may be given access only to public domain material.
 Studies and comparisons will be undertaken on other existing datawarehouses, like PASDA of Penn state University and of EPA and
USGS of government of USA, etc; and learnings will be made use of.
27
The JalaSRI Data Warehousing Path… !
Decision
Support
Use
Decision Maker
5
1
Acquire /
Enhance
User
Interface
Knowledge /
Intelligence
Analyst
Characteristics /
Associations /
Patterns /
Trends
Control /
Interact
Subject Expert
4
1
Knowledge
discovery and
construction
Contribute to
Domain
Knowledge Base
Concepts /
Metadata
3
Intermediate
results
DataData
Mining
Possible refinement
(Also see
mining
Figure
13.3)
Flat files
2
1
Selection and
transformation
Data
Warehouse
1
Integration
and cleaning
Transactional
Databases
Deploy
28
REFERENCES
1. Agrawal R., Gupta A., Sarawagi S. Modeling Multidimensional
Data. In Proc.of 13th Int. Conf. on Data Engineering, ICDE, 1997.
2. Los Alamos National Laboratory. Earth & Environmental Sciences.
GISLab. Spatial Data Warehouse.
http://www.gislab.lanl.gov/data_warehouse.html, 2003.
3. http://www.esri.com.software/arcgis/arcinfo.
http://www.esri.com.software/arcgis/arcinfo, 2003.
4. Bauer A., Hümmer W., Lehner W. An Alternative Relational OLAP
Modeling Approach. In Proc. of the 2nd Int. Conf. on Data
Warehousing and Knowledge Discovery, DeWaK, 2000.
5. http://www.isprs.org/commission4/proceedings/paper.html, 2002.
29
REFERENCES
6. Berson A., Smith S. Data Warehousing, Data Mining and OLAP. Mc
Graw- Hill, 1997.
7. Borges K., Laender A. and Davis C. Spatial Data Integrity
Constraints in Object-Oriented Geographic Data Modeling. In Proc. of
the ACM Symposium on Advances in Geographic Information
Systems, ACM GIS, 1999
30
Thank you
and
Looking forward for
feedback/collaborations !
31