Example: Data Mining for the NBA

Download Report

Transcript Example: Data Mining for the NBA

Data and Applications Security
Developments and Directions
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
NoSQL Data Management
NoSQL Overview
 A NoSQL database provides a mechanism for storage and retrieval of data
that is modeled in means other than the tabular relations used in relational
databases. Motivations for this approach include simplicity of design,
horizontal scaling and finer control over availability.
 NoSQL databases are often highly optimized key–value stores intended
primarily for simple retrieval and appending operations, whereas an RDBMS
is intended as a general purpose data store. There will thus be some
operations where NoSQL is faster and some where an RDBMS is faster.
 NoSQL databases are finding significant and growing industry use in big data
and real-time web applications.[ NoSQL systems are also referred to as "Not
only SQL" to emphasize that they may in fact allow SQL-like query languages
to be used.
 Barriers to the greater adoption of NoSQL data stores in practice include: the
lack of full ACID transaction support, the use of low-level query languages,
the lack of standardized interfaces, and the huge investments already made
in SQL by enterprises.
Taxonomy
 There have been various approaches to classify NoSQL databases, each with
different categories and subcategories. Because of the variety of approaches
and overlaps it is difficult to get and maintain an overview of non-relational
databases. Nevertheless, the basic classification n is based on data model. A
few of these and their prototypes are:
-
Column: HBase, Accumulo, Cassandra
Document: MarkLogic, MongoDB, Couchbase
Key-value: Dynamo, Riak, Redis, MemcacheDB, Project Voldemort
Graph: Neo4J, OrientDB, Allegro, Virtuoso
Classification
 Classification based on data model
 Data-structures server Redis
 Document Store MarkLogic, CouchDB, MongoDB, Jackrabbit, XML-Databases, ThruDB,
CloudKit, Persevere, Riak Basho, Scalaris
 KV Cache Memcached, Repcached, Coherence, Infinispan, eXtreme Scale, JBoss Cache,
Velocity, Terracotta, Gigaspaces XAP
 KV Store Keyspace, Flare, SchemaFree, RAMCloud
 KV Store - Eventually consistent Dynamo, Voldemort, Dynomite, SubRecord, MotionDb,
DovetailDB
 KV Store - Ordered TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
 Object Database ZopeDB, DB4O, Shoal, Perst
 Tuple Store Gigaspaces, Coord, Apache River
 Wide Columnar Store BigTable, HBase, Cassandra, Hypertable, KAI, OpenNeptune,
Qbase, KDI
Classification
 Classification based on Feature
 Data Model
-
Column Store, Document Store, Graph Database, Key–value Stores, Relational
Database
 Performance
 Scalability
 Flexibility
 Complexity
 Functionality
Document Store
 The central concept of a document store is the notion of a
"document".
 While each document-oriented database implementation differs on
the details of this definition, in general, they all assume that
documents encapsulate and encode data (or information) in some
standard formats or encodings.
 Encodings in use include XML, YAML, and JSON as well as binary
forms like BSON, PDF and Microsoft Office documents (MS Word,
Excel, and so on).
 Different implementations offer different ways of organizing and/or
grouping documents:
- Collections
- Tags
- Non-visible Metadata
- Directory hierarchies
 Compared to relational databases, for example, collections could be
considered as tables as well as documents could be considered as
Document Store
 Compared to relational databases, for example, collections could be
considered as tables as well as documents could be considered as
records. But they are different: every record in a table has the same
sequence of fields, while documents in a collection may have fields
that are completely different.
 Documents are addressed in the database via a unique key that
represents that document. One of the other defining characteristics
of a document-oriented database is that, beyond the simple keydocument (or key–value) lookup that you can use to retrieve a
document, the database will offer an API or query language that will
allow retrieval of documents based on their contents.
Graph
 This kind of database is designed for data whose relations are well
represented as a graph (elements interconnected with an
undetermined number of relations between them). The kind of data
could be social relations, public transport links, road maps or
network topologies, for example.
 AllegroGraph , IBM DB2 , DEX , FlockDB , InfiniteGraph , Neo4j ,
OpenLink Virtuoso , OrientDB , Sones GraphDB , Sqrrl Enterprise ,
OWLIM
Key Value Stores
 Key–value stores allow the application to store its data in a schema-
less way. The data could be stored in a datatype of a programming
language or an object. Because of this, there is no need for a fixed
data model.The following types exist:
 KV - eventually consistent, KV – hierarchical, KV - cache in RAM,
KV - solid state or rotating disk, KV – ordered, Object database,
Tabular, Tuple store, Triple/Quad Store (RDF) database, Multivalue
databases, Reality, the original Pick/MV Database, Revelation
Software's , D3 Cell database
NoSQL in the Cloud
 NoSQL databases can be run on-premises, but are also often run on
IaaS or PaaS platforms like Amazon Web Services, RackSpace or
Heroku. There are three common deployment models for NoSQL on
the cloud:
 Virtual machine image - cloud platforms allow users to rent virtual
machine instances for a limited time. It is possible to run a NoSQL
database on these virtual machines. Users can upload their own
machine image with a database installed on it, use ready-made
machine images that already include an optimized installation of a
database, or install the NoSQL database on a running machine
instance.
NoSQL in the Cloud
 Database as a service - some cloud platforms offer options for using
familiar NoSQL database products as a service, such as MongoDB,
Redis and Cassandra, without physically launching a virtual machine
instance for the database. The database is provided as a managed
service, meaning that application owners do not have to install and
maintain the database on their own, and pay according to usage.
Some database as a service providers provide additional features,
such as clustering or high availability, that are not available in the
on-premise version of the database
 Native cloud NoSQL databases - some providers offer a NoSQL
database service which is available only on the cloud. A well-known
example is Amazon’s SimpleDB, a simple NoSQL key-value store.
SimpleDB cannot be installed on a local machine and cannot be
used on any cloud platform except Amazon’s.
Reference
 http://en.wikipedia.org/wiki/NoSQL#Key.E2.80.93value_stores