Literature Search - Computer Science and Engineering

Download Report

Transcript Literature Search - Computer Science and Engineering

6350 Spatio-temporal Data Processing
Course Overview
Yan Huang
[email protected]
Basic Information
 Instructor:Yan Huang (huangyan at unt.edu)
 Meeting place and time: M 2:30-:520pm B157
 Office hours: M 12:30-2:30pm
Basic Information
 TA: Sasi Koneru ([email protected])
 Office hours: Monday 10:00 AM to 2:00 PM, F208
Evaluation
 The evaluation scheme will be
 class participation 10%
 paper analysis and presentation - 25%
 project - 40%.
 Term paper – 30%
Classroom policy
 No computers or laptops unless told so.
Paper Analysis I
 Collect 5 or more papers in one sub-area
 Write short summaries for 3 (100-200 words)
 Make a 15 minutes presentation on what you learn on this
topic
 The presentation will take an integrated approach where you
introduce the motivation of the three papers, give a precise
problem definition, compare and contrast the ways the 3 papers
approach the problem and how they validate their results,
present conclusions, and point to some future directions if you
can identify
Paper Analysis II
 Choose and present one paper from the reading list
 Collect two questions from each group
 Ask two questions yourself
 Lead group discussion
 Detail instructions are available from:
 http://www.cse.unt.edu/~huangyan/6350/paperAnalysis.txt
 One paper every week
Find Related Work
 Need to know the key words
 May need to explore and refine during your search
 Often you can find electronic version of the papers, especially for
publications related to computer science
 Author’s website
 ACM digital library
 IEEE xplore
 Springer Online
 Google scholar
 You school typically subscribes to these publishers
 Search from a computer with IP address belonging to your school
Computer Science Bibliography
Collections
 CiteSeer
 http://citeseer.ist.psu.edu/
 DBLP
 http://www.informatik.uni-trier.de/~ley/db/
 Google Scholar
 http://scholar.google.com/
 ACM Digital Library
 http://portal.acm.org/dl.cfm
 IEEE Xplore
 http://portal.acm.org/dl.cfm
One Way to Find Related Papers
access the rattan
through a melon
access to a melon
along its rattan
Term Project
 ACMGIS CUP 2014
 Team of up-to 2 person
 March 03, 10 minutes presentation on algorithm design and
cost analysis
 Score is based on normalized grade you get from submission.
Term Paper
 Two choices
 Term paper
 Survey paper
Term paper
 Research oriented
 Key components:
 Problem Statement, Significance of the problem
 Related Work and Our Contributions
 Proposed Approach
 Validation of listed contributions (experimental, analytical)
 Conclusions and Future Work
Survey paper
 Key components
 Problem Statement, Significance of the problem
 Our Contributions (usually it is the categorization/classification
of the research literature)
 A classification of the papers related to the problem. Use a
concept hierarchy, figures, and diagrams if necessary.
 Summarize, classify, contrast, and compare the research
literature according to your classification scheme
 A summary of the trend and future work of this line of
research.
 Conclusion.
Spatial Databases (SDBMS)
 Traditional (non-spatial) database management systems provide:
 Persistence across failures
 Allows concurrent access to data
 Scalability to search queries on very large datasets which do not fit inside main
memories of computers
 Efficient for non-spatial queries, but not for spatial queries
 Non-spatial queries:
 List the names of all bookstore with more than ten thousand titles.
 List the names of ten customers, in terms of sales, in the year 2001
 Use an index to narrow down the search
 Spatial Queries:
 List the names of all bookstores with ten miles of Minneapolis
 List all customers who live in Tennessee and its adjoining states
 List all the customers who reside within fifty miles of the company headquarter
Value of SDBMS
 Examples of non-spatial data
 Names, phone numbers, email addresses of people
 Examples of Spatial data
 Census Data
 NASA satellites imagery - terabytes of data per day
 Weather and Climate Data
 Rivers, Farms, ecological impact
 Medical Imaging
 Exercise: Identify spatial and non-spatial data items in
 A phone book
 A Product catalog
User, Application domains
 Many important application domains have spatial data and queries. Some
Examples follow:
 Army Field Commander: Has there been any significant enemy troop
movement since last night?
 Insurance Risk Manager: Which homes are most likely to be affected in
the next great flood on the Mississippi?
 Medical Doctor: Based on this patient's MRI, have we treated somebody
with a similar condition ?
 Molecular Biologist:Is the topology of the amino acid biosynthesis gene
in the genome found in any other sequence feature map in the database ?
 Astronomer:Find all blue galaxies within 2 arcmin of quasars.
 Exercise: List two ways you have used spatial data. Which
software did you use to manipulate spatial data?
SDBMS
 A SDBMS is a software module that
 can work with an underlying DBMS
 supports spatial data models, spatial abstract data types (ADTs) and a query
language from which these ADTs are callable
 supports spatial indexing, efficient algorithms for processing spatial
operations, and domain specific rules for query optimization
 Example: Oracle Spatial data cartridge, ESRI SDE
 can work with Oracle DBMS
 Has spatial data types (e.g. polygon), operations (e.g. overlap) callable from
SQL3 query language
 Has spatial indices, e.g. R-trees
 IBM: Spatial Option
 Informix: Spatial Datablade
SDDMB vs. GIS
 GIS is a software to visualize and analyze spatial data using
spatial analysis functions such as
 Search Thematic search, search by region, (re-)classification
 Location analysis Buffer, corridor, overlay
 Terrain analysis Slope/aspect, catchment, drainage network
 Flow analysis Connectivity, shortest path
 Distribution Change detection, proximity, nearest neighbor
 Spatial analysis/Statistics Pattern, centrality, autocorrelation,
indices of similarity, topology: hole description
 Measurements Distance, perimeter, shape, adjacency, direction
 GIS uses SDBMS
 to store, search, query, share large spatial data sets
SDBMS vs. GIS
 SDBMS focuses on
 Efficient storage, querying, sharing of large spatial datasets
 Provides simpler set based query operations
 Example operations: search by region, overlay, nearest neighbor, distance,
adjacency, perimeter etc.
 Uses spatial indices and query optimization to speedup queries over large
spatial datasets.
 SDBMS may be used by applications other than GIS
 Astronomy, Genomics, Multimedia information systems, ...
Issues in SDBMS
 Spatial data model
 Query language
 Query processing
 File organization and indices
 Query optimization, etc.
Spatio-temporal Databases
 Add temporal dimension
 Examples:
 Trajectories
 Evolving region
 Moving points
Geo-stream databases
 Many data are generated continuously
 Transaction data
 Network monitoring
 Financial application
 Most recent data are commonly queried in a one-pass fashion
 Monitoring
 Aggregation
 Database system provides abstractions and declarative
languages that stream processing can benefit from
Stream Application
 Environmental monitoring
 Patient monitoring
 Finance
 Network monitoring
 Click-streams
 Transaction monitoring
 Traffic analysis
 Moving object queries
 Sensor network
 RFID
Sample Applications
 Environmental monitoring
 Notify me when UV is high, temperature is low
 Traffic monitoring
 Traffic jam: aggregated speed much below speed limit on a road
segment for extended time
 Accident: vehicle on unintended space, e.g. high way for longer
than expected time
 Click-streams
 Find the school districts of the houses that the user browses the
most.
Geo-streams
 Current streams systems lack native spatial support
 Spatial stream queries are common in
 traffic monitoring
 environment monitoring
 moving object databases
Location Privacy
Route prediction
 Next position
 Next stop
 The entire route
 Application:
 Mobile commerce
 Save energy
 Traffic notification
Location-based social networking
 Social networking with location
 Loopts
 Google latitude
 Geocache
 Social dynamics
 Iphone applications
Volunteer Geographic Information
System
 OpenStreetMap,
 Wikimapia
 Foursquare
 Trapster
Spatio-temporal Analytics
The analysis of data with both spatial and temporal information
The data are spatially and/or temporally correlated
"Everything is related to everything else, but near things are more related than distant
things."
Why do we need spatio-temporal analytics
Analytics help us to describe what happened in the past, understand
what is happening now, predict what will happen in the future, and
make decisions.
The proliferation of sensor devices makes spatio-temporal information
a fundamental component for almost every analytical applications
Types of Spatio-Temporal Analytics Methods
Visualization and exploratory analysis
Segmentation (classification and clustering)
Outlier analysis
Colocation mining
Dependency analysis
Trend discovery
Data Visualization and Exploratory Analysis
Map querying task
Static query (one-time query using map tools available on the interface)
Dynamic query[36] (setup of event alert conditions)
Spatial constraints are expressed using the map, while temporal constraints are
expressed as linear time moments[37]
Map animation[38]
Focusing, linking and arranging views[39]
Map iteration[40]
Existential changes[25]
Location changes
Attribute Changes
Data Visualization and Exploratory Analysis: Example
Segmentation methods
Classification[41]
Spatial classification: decision tree, Bayesian, ANN…
Temporal classification: decision tree, Bayesian, ANN…
Temporal extensions to spatial classification/ Spatial extension to temporal
classification
Clustering[42]
Spatial clustering: partitioning method, hierarchical method, density based method,
and grid-based method.
Temporal clustering
Interactive spatio-temporal clustering: perform clustering spatially or temporally
and then test whether the cluster exist in both dimensions (EMM Test[43])
Simultaneous spatio-temporal clustering: space-time scan[44]
More on Spatio-Temporal Clustering
More on Spatio-Temporal Clustering
Model-based clustering[46]
define a multivariate density distribution and look for a set of fitting parameters
for the model.
Distance-based method
Moving object similarity search
Density-based method
DBSCAN extensions, OPTICS[47]
Flocks and convoy
Moving clusters[47]
Applications: movement data, cellular networks, environment data…
Spatio-Temporal Clustering: Example
Spatio-Temporal Outlier Analysis
Definition of outliers
“spatial-temporal object whose thematic attribute values are significantly different
from those of other spatially and temporally referenced objects in its spatial or/and
temporal neighborhoods”.
Methods[48]
Clustering-based approach
Distance based approach
Computational geometry based approach
Spatial scan based approach
Spatio-Temporal Outlier Detection: Example
Co-Location Mining
Colocation mining finds subset of Boolean features located in spatial
proximity
Methods[50]
Data mining-based approach
Spatial statistical approach
Buffer-based model
Temporal extension: mixed-drove approach, weighted window-based model[51]
Co-Location Mining: Example
Other methods
Association rule mining
Spatial preprocessing is required to discretize spatial measurements
Methods[49]
Bayesian networks
Hieratical approach
Trend discovery
Regression
Sequence mining
List of Current Spatio-Temporal Analytics Tools
Commercial
ESRI ArcGIS series
Microsoft SQL Spatial +StreamInsight
Other commercial tools
Open source/free software
Descartes and CommonGIS
MapServer
Other free tools
ESRI ArcGIS Series
ArcGIS desktop and server provide most advanced and complete toolkit
Has many extensions for different domains
Can use APIs to develop extensions, web or desktop applications for
customized needs. Many other commercial tools such as CUBE[9] are
built on top of ArcGIS.
ESRI ArcGIS Desktop and Server Extensions[1]
3D Extension (Desktop and Server)
Analyze terrain data, model subsurface features, view and analyze impact zones,
determine optimum facility placement, share 3D views, create a 3D virtual city.
Geostatistical Extension (Desktop and Server)
Visualize, model, and predict spatial relationships.
Link data, graphs, and maps dynamically.
Perform deterministic and geostatistical interpolation.
Evaluate models and predictions probabilistically
ESRI ArcGIS Desktop and Server Extensions
Network Extension (Desktop and Server)
Dynamically model realistic network conditions and solve vehicle routing problems
Multipoint optimized routing, time-sensitive, turn-by-turn driving directions ,
allocation of service areas, determining the fastest fixed route to the closest facility
Schematics Extension (Desktop and Server)
Rapid checking of network connectivity
Automatically generate schematics
ESRI ArcGIS Desktop and Server Extensions
Spatial extension (Desktop and Server)
Comprehensive, raster-based spatial modeling and analysis.
Survey Extension (Desktop)
Capture, edit, and leverage land records using proven survey methodologies
Tracking Extension (Desktop)
Create time series visualizations so you can analyze information relative to time and
location
ESRI Domain-Specific Solutions
ESRI Business Analyst Online
Web-based solution that combines GIS technology with extensive demographic,
consumer spending, and business data for the entire United States to deliver ondemand, boardroom-ready reports and maps
Perform drive-time analysis
Analyze trade areas
Evaluate sites
Identify most profitable customers and reach customers
ESRI Domain-Specific Solutions
ArcGIS Community Analyst
Web-based solution that provides GIS capabilities to analyze data in a
geographic context as granular as congressional district, block groups, census
tracks, or ZIP Codes.
ArcLogistics
Create optimized routes and schedules based on multiple factors such as
customer needs, business rules, vehicle traits, and street restrictions.
Esri Situational Awareness
Provides a geospatial framework for immediate and long-term situational
awareness needs.
Includes a powerful data fusion and analysis engine; a set of fully customizable
clients for data visualization and analysis; and locally hosted, prerendered data.
Microsoft SQL Library + StreamInsight[2]
Combines SQL Server spatial library with stream processing
engine
Integrating SQL library within StreamInsight engine
Focuses on data stream event processing workflow
GIS Support relies on SQL Server (limited), and therefore need extensive
customization for applications
Other Commercial tools by category
Complete GIS Suite (similar to ArcGIS)
Cardcorp SIS[8],Geomedia[17], IDRISI[18] , Mapinfo[19]
Spatio-temporal analysis
STIS[23]
Network (traffic) analysis tools
ACCESSION GIS[3], AltaMap Suite[4], CUBE[9], DYNAMEQ[15], EMME[14]
Terrain analysis
ANUDEM[5]
CAD applications
AutoCAD Civil3D[6],
Emergency and hazard modeling and analysis
CadnaA[10], Calpuff View[11],Caris[12],CATS[13],Floodworks[16]
Specialized analysis
ClusterSeer and BoundarySeer[7] (cluster and boundary analysis), Mathematica[20]
Mathematics toolkit
Matlab Plus toolbox[21], SPSS[22]
Descartes and CommonGIS[24]
An interactive java based GIS tool for visualization and exploratory
analysis.
Functionalities
Map and graph visualization (Choropleth maps, scatter plot…)
Basic queries (distance, difference…)
Dynamic queries
Open source and customizable, lack advanced GIS analytics
functionalities
MapServer[31]
Open source GIS data rendering engine
Functionalities
Advanced cartographic output
Cross platform and APIs for all popular scripting languages
Support many formats
OGC standard compliant
Not a full GIS suite
Other Free/Open Source tools by category
Complete GIS Suite (similar to ArcGIS)
GRASS[28]
Spatio-temporal analysis
Map comparison kit[30], STAR[34]
Terrain analysis
Landserf[29]
Exploratory data analysis
GeoDA[26]
Database extension
PostGIS[32]
Specialized analysis
GAM/K[25](Clustering), GRASP[27](Regression)
Mathematics toolkit
R Spatial[33]
Spatio-temporal analytics is becoming an fundamental component of
business analytics
The future
Big data (bigger due to spatio-temporal dimension)
Real time (not only historical spatio-temporal data, but also streaming data that
requires optimization at all levels)
References (I)
[1] http://www.esri.com/software/arcgis/index.html
[2] http://msdn.microsoft.com/en-us/library/ee362541.aspx
[3] http://www.citilabs.com/accession.html
[4] http://www.geomicro.com/
[5] http://fennerschool.anu.edu.au/publications/software/anudem.php
[6] http://usa.autodesk.com/adsk/servlet/pc/index?siteID=123112&id=8777380
[7] http://www.terraseer.com/products_boundaryseer.php
[8] http://www.cadcorp.com/products_geographical_information_systems/index.htm
[9] http://www.citilabs.com/cube_base.html
[10] http://www.datakustik.com/en/products/cadnaa/
[11] http://www.weblakes.com/calpuff/calpuff_overview.html
[12] http://www.caris.com
[13] http://www.saic.com/products/security/cats/
[14] http://www.inro.ca/en/products/emme/
[15] http://www.inro.ca/en/products/dynameq/
[16] http://www.wallingfordsoftware.com/uk/products/floodworks/
[17] http://www.intergraph.com/sgi/default.aspx
[18] http://www.clarklabs.org/
[19] http://www.pbinsight.com/welcome/mapinfo/
[20] http://www.wolfram.com/products/mathematica/newin7/content/IntegratedGeodesyAndGIS
[21] http://www.mathworks.com/
[22] http://www.spss.com/
[23] http://www.terraseer.com/products_stis.php
References (II)
[24] http://www.esds.ac.uk/international/support/user_guides/gisoverview.asp
[25] http://www.ccg.leeds.ac.uk/software/gam/
[26] http://geodacenter.asu.edu/software
[27] http://www.unine.ch/CSCF/grasp/
[28] http://grass.fbk.eu/
[29] http://www.landserf.org
[30] http://www.riks.nl/products/Map_Comparison_Kit
[31] http://mapserver.gis.umn.edu/
[32] http://postgis.refractions.net/
[33] http://cran.r-project.org/web/views/Spatial.html
[34] http://regionalanalysislab.org/index.php/Main/STARS
[35] P. Compieta, S. Di Martino, M. Bertolotto, F. Ferrucci, and T. Kechadi. 2007. Exploratory spatio-temporal data mining and
visualization. J.Vis. Lang. Comput. 18, 3 (June 2007), 255-279.
[36] C. Ahlberg, C. Williamson, B. Shneiderman, Dynamic queries for information exploration: an implementation and
evaluation, in: Proceedings ACM CHI’92, ACM Press, New York, 1992, pp. 619–626.
[37] M. Harrower, A.M. MacEachren, A.L. Griffin, Developing a geographic visualization tool to support earth science learning,
Cartography and Geographic Information Science 27 (4) (2000) 279–293.
[38] W.L. Hibbard, B.E. Paul, D.A. Santek, C.R. Dyer, A.L. Battaiola, M.-F. Voidrot-Martinez, Interactive visualization of earth
and space science computations, Computer. 27 (7) (1994) 65–72.
[39] A. Buja, J.A. McDonald, J. Michalak, W. Stuetzle, Interactive data visualization using focusing and linking, in: Proceedings
IEEE Visualization’91, IEEE Computer Society Press, Washington, 1991, pp. 156–163.
[40] D. Stojanovic, S. Djordjevic-Kajan, A. Mitrovic, Z. Stojanovic, Cartographic visualization and animation of the dynamic
geographic processes and phenomena, in: Proceedings of 19th International Cartographic Conference, Ottawa, Canada, Vol. 1,
1999, pp. 739–746.
References (III)
[41] Kumar, M.; Bhatt, G.; Beeson, P.; Duffy, C. Automated Detection and Spatio-Temporal Classification of Channel
Reaches in Semi-arid Southwestern US Using ASTER. American Geophysical Union, 2006 Joint Assembly.
[42] Tim E. Carpenter, Methods to investigate spatial and temporal clustering in veterinary epidemiology, Preventive
Veterinary Medicine, Volume 48, Issue 4, 29 March 2001, Pages 303-320.
[43] Fosgate, G.T., Carpenter, T.E., Case, J.T., Chomel, B.B., 2000. Time±spatial clustering of human cases of
brucellosis: California, 1973±1992. In: Proceedings of the Ninth International Society on Veterinary Epidemiology and
Economics, Breckenridge, CO
[44] McKenzie, J.S., Pfeiffer, D.U., Morris, R.S., 2000. Spatial and temporal patterns of vector-borne tuberculosis
infection in beef breeding cattle in New Zealand. In: Proceedings of the Ninth International Society on Veterinary
Epidemiology and Economics, Breckenridge, CO
[45] Chudova D, Gaffney S, Mjolsness E, Smyth P (2003) Translation-invariant mixture models for curve clustering. In:
KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,
ACM, New York, NY, USA, pp 79–88
[46] Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure.
SIGMOD Rec 28(2):49–60
[47] Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio-temporal data. Advances in Spatial
and Temporal Databases pp 364–381
[48] Birant, D.; Kut, A.. Spatio-temporal outlier detection in large databases. 28th International Conference on
Information Technology Interfaces, 2006.
[49] Jeremy Mennis, Jun Wei Liu. Mining Association Rules in Spatio-Temporal Data: An Analysis of Urban
Socioeconomic and Land Cover Change. http://onlinelibrary.wiley.com/doi/10.1111/j.14679671.2005.00202.x/abstract.
[50] Y. Huang, S. Shekhar, and H. Xiong, “Discovering colocation patterns from spatial datasets: A general approach.,”
IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 12, pp. 1472–1485, 2004
[51] Feng Qian ; Liang Yin ; Qinming He ; Jiangfeng He ;. Mining spatio-temporal co-location patterns with weighted
sliding window. IEEE International Conference on Intelligent Computing and Intelligent Systems, 2009. ICIS 2009.