Transcript ppt slides

APPROXIMATE QUERY
PROCESSING
BY
KAVYA REDDY MUSANI
Types of RDBMS
There are two types of RDBMS
 Operational/Transactional Databases:
---Used for day to day operations.
---It changes rapidly.
 DataWarehousing:
---It is useful for decision support and data
analysis.
---Not useful for day to day operations.
---It doesn’t change rapidly.
Analysis of DataWarehousing
1.
2.
3.
4.
Decision support
OLAP tools
Statistics
Data Mining
Ways to improve Query Processing


Indexes
 Indexes are used to speed up querying on huge databases.
 The different kinds of indexes are B+ trees, Hash indexes,
Bitmap indexes, etc.
Materialized Views
 Materialized views improve query performance by precalculating
expensive join and aggregation operations on the databases
prior to execution time and storing these results in the database.
Example of Approximation




Suppose we have “Sales” table showing the sales of
some products with attributes Product_ID,
Product_Name, Price, Quantity and State.
Now we want to know the sales of a product in each
state of a country.
Then we write a query which outputs the total sales of
that product in each state.
Example of query:
Select State as State, Sum(Price) as total_price from
Sales group by State.
Contd…
Contd…




We can approximate the total_price to an integer so that
we can see the behaviour of the product sales in
different states clearly.
For example if for the state of Texas the total_price is
56788.9866 dollars then we can approximate it to
60000.
Now we need to indicate the error in the corresponding
column.
This error is called Confidence Interval because it shows
the amount by which we have approximated the value.
Synopsis Method


Compress the data into a smaller representation (known
as “synopsis”).
Execute query against synopsis and produce answers.
Methods of Data Compression



Sampling.
Building models using statistics.
Curve Fitting.
Types of Synopsis




Random Sampling (lossy compression).
Histograms.
Wavelets.
Learning Joint Distribution.
Random Sampling


A sampling procedure that assures that each element in
the population has an equal chance of being selected is
referred to as simple random sampling.
There are different random sampling techniques like
Simple random sampling, Systematic sampling, Stratified
sampling, Cluster sampling, multi-stage sampling, etc.
Role of Random sampling in
Approximate Query Processing



Suppose we want to find the average of salaries of
employees of a big company for a particular year.
Involving every employee’s salary in the average salary
slows down the processing.
So, we take a small percentage of random sample
(synopsis) of the huge table and find the average salary.
Contd…
Contd…



This improves the speed of processing the
query.
However, care should be taken while extracting
random sample.
If the random sample drawn is assured to be
good, then we can query the random sample
(synopsis) for further references which reduces
the processing time considerably and now we
project the result to the actual table by
multiplying with sampling fraction.