Transcript Document

Structural Knowledge Discovery Used to
Analyze Earthquake Activity
Jesus A. Gonzalez
Lawrence B. Holder
Diane J. Cook
MOTIVATION AND GOAL

Need to analyze large amounts of information in
real world databases.

Information that standard tools can not detect.

Earthquake Database.

Previous knowledge: Spatio-Temporal relations.
SUBDUE KNOWLEDGE DISCOVERY
SYSTEM

SUBDUE discovers patterns (substructures) in
structural data sets.

SUBDUE represents data as a labeled graph.

Inputs: Vertices and Edges.

Outputs: Discovered patterns and instances.
EXAMPLE
Vertices: objects or attributes
Edges: relationships
shape
triangle
object
on
shape square
object
4 instances of
EVALUATION CRITERION

Minimum Encoding.

Graph Compression.

Substructure Size (Tried but did not work).
EVALUATION CRITERION
MINIMUM DESCRIPTION LENGTH

Minimum Description Length (MDL) principle. The
best theory to describe a set of data is the one that
minimizes the DL of the entire data set.

DL of the graph: the number of bits necessary
to completely describe the graph.

Search for the substructure that results in the
maximum compression.
THE EARTHQUAKE DATABASE

Several catalogs.

Sources like the National Geophysical Data Center.

Each record with 35 fields
earthquake characteristics.
describing
the
THE EARTHQUAKE DATABASE
KNOWLEDGE REPRESENTATION
PDE_W
Category
Year
Month
EVENT 1
Near_in_time
1998
01
Magnitude
4.5
EVENT 2
Near_in_distance
EVENT 3
EVENT m
THE EARTHQUAKE DATABASE
PRIOR KNOWLEDGE

Connections between events where its epicenters
were close to each other in distance (<= 75
kilometers).

Connections between events that happened close to
each other in time (<= 36 hours).

Spatio-Temporal
relations
represented
with
“near_in_distance” and “near_in_time” edges.
DETERMINING EARTHQUAKE
ACTIVITY

Geologist Dr. Burke Burkart.

Study of seismology caused by the Orizaba Fault.

Fault: A fracture in a surface where a displacement of rocks
also happened.

Selection of the area of study, two squares:

First Longitude 94.0W through 101.0W and Latitude
17.0N through 18.0N.

Second Longitude 94.0W through 98.0W and Latitude
18.0N through 19.0N.
DETERMINING EARTHQUAKE
ACTIVITY

Area of Study
DETERMINING EARTHQUAKE
ACTIVITY

Divide the area in 44 rectangles of one half of a degree in
both longitude and latitude.

Sample the earthquake activity in each sub-area.

Run Subdue in each sub-area.
DETERMINING EARTHQUAKE
ACTIVITY
Area
Number
Area Coordinates
Latitude
Area
Name
Number of
Events
Longitude
1
2
3
4
5
6
7
8
9
10
101.0W
101.0W
100.5W
100.5W
100.0W
100.0W
99.5W
99.5W
99.0W
99.0W
100.5W
100.5W
100.0W
100.0W
99.5W
99.5W
99.0W
99.0W
98.5W
98.5W
17.0N
17.5N
17.0N
17.5N
17.0N
17.5N
17.0N
17.5N
17.0N
17.5N
17.5N
18.0N
17.5N
18.0N
17.5N
18.0N
17.5N
18.0N
17.5N
18.0N
Gue1
Gue2
Gue3
Gue4
Gue5
Gue6
Gue7
Gue8
Gue9
Gue10
62
40
57
13
71
15
35
16
13
14
26
27
28
29
30
95.0W
94.5W
94.5W
98.0W
98.0W
94.5W
94.0W
94.0W
97.5W
97.5W
17.5N
17.0N
17.5N
18.0N
18.5N
18.0N
17.5N
18.0N
18.5N
19.0N
Ver1
Oaxver4
Ver2
Pue1
Pue2
43
35
23
6
0
42
43
44
95.0W
94.5W
94.5W
94.5W
94.0W
94.0W
18.5N
18.0N
18.5N
19.0N
18.5N
19.0N
Vergolf5
Vergolf4
Vergolf6
1
3
1
DETERMINING EARTHQUAKE
ACTIVITY

Substructure 1 (with 19 instances) and substructure 2 (with
8 instances) found in sub-area 26.
Near_in_distance
Event
Event
Region_number
Category
61.00
Sub_1
Region_number
Depth
Category
PDE
Substructure 1, 19 instances.
PDE
61.00
33.00
Dept_ctl
N
Substructure 2, 8 instances.
Coord_qual..
%
DETERMINING EARTHQUAKE
ACTIVITY

This pattern might give us information about the cause of
the earthquakes.

Subduction also affects this area but it affects at a specific
depth according to the closeness to the Pacific Ocean.
SUBDUE’S POTENTIAL

Subdue finds not only shared characteristics of events, but
also space relations between them.

Dr. Burke Burkart is studying the patterns to give direction
to this research.

Expect to find patterns representing parts of the paths of
the involved fault.

Time relations not considered by Subdue.

Earthquake’s characteristics.

Important for other areas.
CONCLUSION

Subdue successful in real world databases.

Subdue used prior knowledge to guide search with
temporal and spatial relations.

Subdue discovered interesting patterns using these
temporal and spatial relations.

Subdue is being used as the data mining tool to study the
“Orizaba Fault” in Mexico.
FUTURE WORK

Concept Learning Subdue

Theoretical analysis.

Bounds on complexity (e.g. PAC learning).

Graphic User Interface to visualize substructures and their
instances.