the Dasvis Project Presentation
Download
Report
Transcript the Dasvis Project Presentation
www.sgt-inc.com
SGT Technology Innovation Center
Dasvis Project
12 March 2015
© 2015 SGT Inc.
Rohit Mital
Jay Ellis
Ashton Webster
Grant Orndorff
Introduction
• About SGT Technology Innovation
Center
• Genesis for Dasvis project
3/12/2015
© 2015 SGT Inc.
2
Purpose
• Project Goals
– Develop a real-time distributed processing framework for
big data
– Determine how tools like Dasvis (built upon this
framework) can fit in with other tools in the marketplace
– Design and develop a complementary tool suite to SGT’s
Cyber Security capabilities to ensure the security of SGT
and our infrastructure
– Dasvis is designed to be a customizable network
monitoring tool
▪ Will mirror the capabilities of standard SIEM, Network
IDS/IPS, and other tools
▪ Can accept a variety of inputs
3/12/2015
© 2015 SGT Inc.
3
Data Exfiltration in the News
•
•
•
3/12/2015
Sony Pictures Entertainment - 2014
– Attack on Sony by “Guardians of Peace” (with suspected Nation-State
involvement) in retaliation to the release of movie “The Interview”
– Exfiltration of PII from Sony employees / family members, emails,
executive salaries, and previously unreleased Sony movies
– Elimination of wide-scale theatrical movie release
Edward Snowden - 2013
– Former NSA contractor, CIA, and DIA employee who released thousands
of classified documents about NSA’s global Surveillance programs
– Charged with espionage by US DOJ (30 year sentence) and theft of
government property, currently living in Russia
WikiLeaks - 2006 to present
– 1.2 million documents published in the first year after website launch
– Initial communication to WikiLeaks founder by PFC Manning (currently
serving 35 year prison term) considered to be the largest leak of classified
information in history, to include:
▪ 500,000+ US Army reports (Afghan and Iraq War Logs)
▪ 250,000+ unredacted US State Department cables
© 2015 SGT Inc.
4
Real-World Applications
• Large-scale data exfiltration from both government and
commercial sector becoming all too common
• Loss of sensitive and classified data occurring for and
by corporations and Nation States
• Indicates a need for companies to monitor network
and/or user activity to protect against these types of
threats
• Tools and frameworks needed to process the amount of
information necessary to thwart these types of attacks
3/12/2015
© 2015 SGT Inc.
5
System Architecture
• Cloud-based Real-time distributed processing
framework
• Developed using standard, open-source tools with
an available labor pool to support future
maintenance and expansion
• Designed with flexibility and portability in mind
3/12/2015
© 2015 SGT Inc.
6
Dasvis Architecture / Tools
Configuration Processing
Apache 2.4 Web Server
Capture and Processing
Packet Captures: Pcap4j
Data Transfer: Apache Kafka Queue
Distributed/Real Time Processing: Apache Storm/Trident
Data Storage
NoSQL Databases:
Primary Packet Store: MongoDB
Aggregate/Time Series DB: Cube DB
Reporting/Graphing
Apache 2.4 Web Server
PHP Web Framework: Laravel
Graphing/Visualizations: Google Visualizations
Post Processing (Future)
Integration with HDFS/Hadoop with queries using HQL
3/12/2015
© 2015 SGT Inc.
7
Apache Kafka Queue
• Kafka is a distributed messaging system that is
used to transfer large amounts of data between
processes.
• It is a queue and has producers and consumers
• Producers push data to a Kafka Queue
• Consumers pull data from a Kafka Queue
• Basically a reliable way to send big data from one
place to another in virtually any format
3/12/2015
© 2015 SGT Inc.
8
MongoDB and CubeDB
3/12/2015
• MongoDB is a NoSQL database
• Has collections (analagous to tables in SQL)
that can accept documents of varying
structures
• Uses JavaScript Object Notation (JSON) for
more flexible format (similar to rows in SQL)
• Unlike other databases (e.g. MySQL) that
require every object inserted to be of the exact
same structure/schema
• CubeDB is a Time Series database that sits on
top of MongoDB
• A time series database is a database that is
highly optimized for queries based on time of
insertion
© 2015 SGT Inc.
9
Apache Storm/Trident
•
•
3/12/2015
Storm allows one to process large
amounts of data in real time by
providing an abstraction for writing
distributed processing programs
– Spout: A unit that creates a
stream of data to be
processed
– Bolt: A unit that accepts a
stream of data, performs an
operation on it, and optionally
passes on more data.
– Topology: A collection of
spouts and bolts connected by
the streams of data passed
between them
Storm Bolts and Spouts can be run
as multiple tasks (threads) and even
on different machines in parallel
• Trident is a further abstraction on
top of Storm that handles the
creation of spouts and bolts in
what it deems the most efficient
topology
© 2015 SGT Inc.
10
How it All Fits Together
3/12/2015
© 2015 SGT Inc.
11
Dasvis Storm Topologies:
“Tracking” and “Comparing”
• The “Tracking” Topology looks at incoming data and
aggregates data that we want to track
– Aggregated data is stored in the Time Series
database, and sent to the Comparing Topology
• The “Comparing” topology compares the incoming data
to the Baseline Data to look for anomalies
Raw Data
Do we want
to track this
data?
Yes
Aggregate
incoming
data
Discard Data
Tracking Topology
3/12/2015
Aggregated
Data
Compares
incoming data to
baseline data
Comparison
information
Comparing Topology
© 2015 SGT Inc.
12
A Closer Look at the
Tracking Topology
Packet Spout:
Packet is retrieved from
Kafka Queue
Single Insertion Bolt:
Packet inserted to
MongoDB
Packet Parse Bolt:
Packet Parsed to JSON
Packet Match Bolt:
Packet Matched with
Configurations
Packet Aggregation
Bolt:
Packet aggregated over
time with other packets
3/12/2015
Aggregate Forward
Bolt:
Aggregated packets
sent to Comparing
Topology via Kafka
Queue
Aggregate Insertion
Bolt:
Packet aggregate data
stored in Time Series
Database
© 2015 SGT Inc.
Spouts and bolts
make for simple
programming
abstractions
•
Spouts start the
data processing
• Bolts are
operations on
those packets
Bolts
Data
Flow
13
A Closer Look at the
Tracking Topology
Packet Spout:
Single Insertion Bolt:
Packet Parse Bolt:
Packet Match Bolt:
Packet Aggregation
Bolt:
3/12/2015
Bolts Can Run as
multiple Tasks
• Tasks can be
thought of as
threads
Aggregate Forward
Bolt:
Bolts
Aggregate Insertion
Bolt:
© 2015 SGT Inc.
Task
14
A Closer Look at the
Tracking Topology
Node 1
Node 2
Packet
Spout:
Packet
Spout:
Node 4
Single Insertion Bolt:
Packet Parse
Bolt:
Bolts can run on
multiple nodes
in a cluster
•
Packet Parse
Bolt:
•
Packet
Match Bolt:
Packet
Match Bolt:
Aggregate Forward
Bolt:
Each bolt can still
run as multiple
tasks
This greatly
improves
performance
Bolts
Packet Aggregation
Bolt:
Aggregate Insertion
Bolt:
Tasks
Nodes
Node 3
3/12/2015
Node 5
© 2015 SGT Inc.
15
Episodes and Baseline Data
• Baseline Data is the data that represents what the
incoming data to Dasvis should look like
– If the incoming data is significantly different from the
Baseline Data, then we have an anomaly
• An Episode is a set of Baseline Data associated with a
set of Conditions
– This allows the user to have different sets of
Baseline Data for different times.
Episodes of Baseline Data
Normal Baseline Data
3/12/2015
© 2015 SGT Inc.
16
Review of
Dasvis-Specific Concepts
• Tracking vs Comparing Topologies
– Tracking topology records and aggregates the
incoming data we want to track
– Comparing topology decides if there are anomalies
in incoming data by comparing against baseline data
• Baseline Data
– Past data aggregated by Dasvis that represents the
normal distribution of data
• Episode
– A set of Baseline Data that is only used at specific
times (Ex. only on Mondays, or only during business
hours)
3/12/2015
© 2015 SGT Inc.
17
Demo
• Mini Tutorial
– Creating a Baseline
– Setting Baseline Data
• Example Scenario and expected output
– Normal data that matches baseline well
– Potentially malicious activity
3/12/2015
© 2015 SGT Inc.
18
Summary
• Challenges / Issues
– Need to clarify current use of Open source tools and
potential costs for deploying Dasvis as a COTS product
• Future Plans
– Adding new inputs such as Netflow, Application Logs,
etc. in addition to packet capture
– Adherence to NIST Cyber Security Situational
Awareness specification
3/12/2015
© 2015 SGT Inc.
19
Comments/Questions?
• Your Feedback is Appreciated!
© 2015 SGT Inc.
20