Transcript Document

Performance Comparison of
Clustered Systems
Yugandhar Maram, #91527748
Anjana Vadivel, #78563168
Stuthi Balaji, #34682837
OUTLINE





Motivation/Goals
System architecture/tools used/Softwares integrated
Related work and efforts
Validation/Evaluation
Results
Motivation and Goals



To study the architecture of widely used distributed
systems and familiarised ourselves with Hadoop and
Spark and Google File Systems
Aimed at analyzing the performance of these
distributed systems under high work-loads.
Hive DB and sparkSQL
System Architecture
Hadoop Cluster with
Database distributed
across nodes.
HIVE
(Issuing SQL queries
to Hadoop
Distributed system)
Spark Cluster using
HDFS.
SparkSQL
(Issuing SQL queries
to Spark Distributed
system)
Tools used/Softwares Integrated
 Hadoop and Spark with Hive and SparkSQL atop
those systems, respectively.
 TPC-H benchmark data for for Load generation.
 DBGen
Related work and efforts (cont.)


Set up the Hadoop and Spark environment along
with the Hive,SparkSQL databases of size 30 GB on
the cluster.
Issued TPCH benchmark SQL queries to the hive
and SparkSQL databases that queries the database
spread across the nodes of the systems.
Hive Query Results
THANK YOU!!