Sample Big Data Projects

Download Report

Transcript Sample Big Data Projects

Machine Learning for Spatio-temporal Datasets
and Remote Sensing
Remote Sensing for Climate Modeling
Physics-based feature detectors combined
via machine learning into compound classifier
Terabyte Size
Dataset per
Simulation
Machine learning
Physics-based
Change Detection
Objective: Identify Changes (damaging changes – due to fires,
insects, inclement weather conditions; changes in cropping patterns –
e.g, “Soy” in one year to “Corn” in the other year)
Data: MODIS NDVI time series
Existing Studies:
Varun Chandola, Ranga Raju Vatsavai: A scalable Gaussian process
analysis algorithm for biomass monitoring. Statistical Analysis and
Data Mining 4(4): 430-445 (2011)
Vegetation Classification
Objective: Land Use/Land Cover Classification. Also known as
thematic classification is still challenging task. Challenges include
overlapping signatures, temporal (phonological) dependencies, soil
types and elevation changes, climatic regions, and more over the
subjective definition of a “class.”
Data: Multi-spectral and multi-temporal Landsat-8 imagery
Existing Studies:
Varun Chandola, Ranga Raju Vatsavai: Multi-temporal remote
sensing image classification - A multi-view approach. CIDU 2010:
258-270
Ranga Raju Vatsavai, Shashi Shekhar, Budhendra L. Bhaduri: A
Semi-supervised Learning Algorithm for Recognizing Sub-classes.
ICDM Workshops 2008: 458-467
Urban Neighborhood
Classification
Objective: Identify different types of urban neighborhoods in very highresolution satellite imagery. Often “pixel” and “object” based methods are
not sufficient to identify to accurately identify different neighborhoods
(e.g., formal vs. informal) in satellite imagery and requires novel
approaches to characterize higher-order spatial patterns.
Data: Very high resolution (VHR) areal imagery
Existing Studies:
Object-based: Ranga Raju Vatsavai: Object based image classification:
state of the art and computational challenges. BigSpatial@SIGSPATIAL
2013: 73-80
Multiple-Instance: Ranga Raju Vatsavai: Gaussian multiple instance
learning approach for mapping the slums of the world using very high
resolution imagery. KDD 2013: 1419-1426
Mobility and Physical Activity, Interactive User
Interfaces
Actigraph GT3X
Modeling Mobility Behavior
User Internet Information:
1. Spatial and location-based
information (buildings)
2. Temporal information (Sessions
times and duration)
3. Interest-based information (web
domains visited)
4. Load and traffic information (flow
rate and packet rate)
Hierarchical
Clustering
Change
Detection
Terabytes per
week
Publically Available Datasets
• http://www.quora.com/Where-can-I-findlarge-datasets-open-to-the-public
• http://www.inside-r.org/howto/finding-datainternet