the HOWLER project

Download Report

Transcript the HOWLER project

the HOWLER project
Work Term Summary / Presantation
By: Sean McGrath
1
Introduction
• Imagine that you are the CEO of a large
company, with large amounts of transaction
data at your disposal.
• How do you make sense of all the data?
2
Introduction
• Solution: OLAP (On-line Analytical
Processing).
• OLAP = a marketing term more catchy than
multidimensional database.
• Can present the data to people in a way that
makes sense..
3
Introduction
Example:
You are Amazon’s CEO and you are
wondering if cheaper music CDs sell more
than expensive ones.
Solution…
4
Introduction
5
Introduction
Solution:
Yes. Cheaper items do sell better than more
expensive ones.
6
About HOWLER
What is HOWLER?
• A web-based OLAP tool that provides a way
to perform range queries on multidimensional datacubes with the query results
shown in a graphical format.
7
About HOWLER
What is a datacube?
• To keep things simple a datacube can be
thought of as a multi-dimensional array in our
case (using MOLAP).
• The formal definition is more complex than
this.
8
About HOWLER
What else is HOWLER used for?
• For research purposes HOWLER also
provides the ability to benchmark various
low-level implementations of datacubes.
9
About HOWLER
More….
• The implementation of HOWLER is based upon the
LEMUR project, which provides a C++ backend for
the datacube objects. HOWLER is a web interface
written in PHP and Python, which uses Python
wrapper classes for each of the different datacube
implementations (written in C++).
10
About HOWLER
About the name:
• HOWLER comes from the name of the
monkey.
• Howler monkeys are to lemurs as the
HOWLER project is to the LEMUR project.
11
About HOWLER
Where does HOWLER’s data come from?
• Some is made up.
• Most is real data collected from Google and
Amazon using both of their web APIs.
12
About HOWLER
Example Queries:
• How does a customers rating of a music CD
affect it’s sales rank?
• We will use HOWLER to plot the average
sales rank for against customer ratings.
13
About HOWLER
14
About HOWLER
Example Queries:
• How were albums that cost less than $10
and have more than 15 songs rated by
customers?
15
About HOWLER
16
HOWLER Front End
Features:
• You can choose from several different
datacubes to query.
• You can limit the range for all numeric
attributes (or dimensions).
17
HOWLER Front End
Features (continued):
• Uses meta data stored in a separate file for
presenting all information on a given cube
and it’s attributes.
• Plots query results as a bar chart.
18
HOWLER Front End
Features (continued):
• Query types = Count, Average, Standard
Deviation, and Average with Standard
Deviation.
• Can run the query on various low-level
implementations of datacubes.
19
HOWLER Front End
Features (continued):
• Can benchmark the various datacube
implementations.
• Can return the data in numeric format for
spreadsheets and more advanced graphing
utilities.
20
HOWLER Front End
Graphing Script:
• Uses a heavily modified version of phpplot.
• Also used by RACOFI music.
• Written in php.
21
HOWLER Front End
What is PHP?
• PHP is a widely-used general-purpose
scripting language that is especially
suited for Web development and can be
embedded into HTML.
22
HOWLER Back End
System Structure:
Web  Python  C++
23
HOWLER Back End
Python Scripts:
• Used to wrap C++ datacube classes.
• Used for building cubes because of it’s
simplicity.
• Uses meta data (C++ does not)
24
HOWLER Back End
How does Python use the C++ classes?
Answer: Boost.Python
25
HOWLER Back End
What is Boost.Python?
• A set of C++ libraries that when used can
compile a C++ class into a shared library
which can then be used by python.
26
HOWLER Back End
C++ implementation:
• LEMUR
• Various classes for different cube types;
standard, chunked, RAM, and memory
mapped datacubes.
• Plus more!!!
27
HOWLER Back End
LEMUR (My Contribution):
• Memory Mapped class
• Fixed RAM cube
• FileUtil class for functions related to files.
• PythonDatacube class (used by Boost)
28
HOWLER Back End
Datacube Types:
1. Standard – stores cube in a flat file.
2. RAM – stores entire cube in memory.
3. Memory Mapped – maps file on disk into
memory.
4. Chunked – stores cube in file, but differently
than the standard datacube.
29
HOWLER Back End
Performance:
30