Using Neural Networks in Database Mining by Tino Jimenez

Download Report

Transcript Using Neural Networks in Database Mining by Tino Jimenez

Using Neural Networks in
Database Mining
Tino Jimenez
CS157B
MW 9-10:15
February 19, 2009
Data Mining Overview
 What is Data Mining?
 The process of extracting values from a
database
 Why do we need/use it?
 Predictive technology
 Allows for automated decision making
Data Mining Overview (continued)
What problems does it solve?
 Stock Market prediction
 Credit card fraud
 Loan approval/denial
How does it work?
 Data analysis of a given set of
information
Data Mining Tools
Decision Trees
 A series of rules that allows for automated
decision. Common use: credit card and health
insurance approvals
Regression
 Analysis of the association between a
dependent variable and an independent
variable. Common use: prediction
Neural Networks
The Basis of Neural Networks
Adapted from the research of Artificial
Intelligence
Based loosely on the biological
functionality of neurons
Mimics the ability to “learn”
A neuron is a specialized cell that sends
an electrochemical signal
The Basis of Neural Networks (cont.)
Each neuron has a specific function and is
grouped with other neurons to be able to
perform complex tasks
Each neuron has a “weight” which is a
determining factor in the importance of the
specific function being processed
How Neural Networks Work
An individual neuron has a step activation
function which means that it can have
either a -1,0 or 1 value.
 A value of -1 means that it is an inhibitor and
will lessen the weight of the combined neurons
The individual neurons are the connected
to each other as inputs and outputs.
The inputs carry the values of variables of
interest
The outputs form predictions or control signals
How Neural Networks Work (cont.)
Feedforward Structure
 The most useful in solving real-world problems
Signals flow from inputs through hidden units,
eventually to the output units
Input layer is used only to introduce the values
of the input variables
The hidden and output layer neurons are each
connected to the all of the units of the preceding
layer
How Neural Networks Work (cont.)
When the network is used, the variable
values are placed in the input units and
each subsequent layer, calculates the
weighted sum of the outputs of the
preceding layer until reaching the final
layer.
How Do You Apply a Neural Network
Exact nature of inputs and outputs will be
unknown
Large quantities of data are necessary
Data can be “noisy”
2 ways to set-up the network
Supervised Learning
Unsupervised Learning
Supervised Learning
 Data involves historical data sets containing
input variables, which correspond to an output
 Uses training and testing data to build a model
 The training data is what the neural network
uses to “learn” how to predict the known output.
Also used for validation
Famous algorithm is back propagation
Uses the data to adjust the weights to minimize the error
in its predictions.
Unsupervised Learning
Very uncommon to use
Attempts to locate clusters within the input
data regardless of variable
Supervised Learning only uses input variables
from a training set
Advantages to Using a Neural Network
High Accuracy
Able to approx. complex non-linear mapping
Noise Tolerance
Flexible with respect to missing and noisy data
Ease of maintenance
Can be implemented in parallel hardware
Can be updated with new data, making them
dynamic
Disadvantages to Using a Neural Network
Poor Transparency
Operate as “black boxes” with little/no
knowledge of the algorithms used
Trial-and-Error Design
The selection of hidden nodes and training
parameters are heuristic
Data Hungry
Requires large amounts of data to be accurate
which also means more computing power
Applications of Neural Networks
Detection of medical phenomena
Recognizes predictive patterns to prescribe
appropriate treatment
Stock market prediction
Large numbers of factors are introduced and
used by technical analysts
Credit assignment
Identifies most relevant characteristics and
classifies applicants as good or bad credit risks