Flexible Web Visualization for Alert

Download Report

Transcript Flexible Web Visualization for Alert

Flexible Web Visualization for AlertBased Network Security Analytics
Lihua Hao1, Christopher G. Healey1, Steve E. Hutchinson2
1North
Carolina State University, 2U.S. Army Research Laboratory
[email protected]
ARO MURI Meeting, ASU, October 29, 2013
1/22
Introduction
•
•
Building a visualization tool for Army Research Laboratory (ARL) network
security analysts
Driven by analysts
- “Do not fit our problem to your tool, but build a tool to fit our problem.”
- Our approach does not focus explicitly on network security data, but rather on network
security analysts
•
Balance
- Meeting needs of the analysts
- Applying knowledge and best practices from visualization
•
•
A web-based visualization tool to support flexible network data analysis
Looking for comments & advices about an idea
- Will the ongoing ensemble visualization research be useful in network security domain?
- How to adjust the techniques to better fit the requirements in network security domain?
2/22
Design Constraints
1. Mental models
- “Fit” the mental models the analysts use to investigate problems
2. Working environment
- Integrate into the analyst’s current working environment (e.g., web browser for ARL analysts)
3. Configurability
- Static, pre-defined presentations of the data are typically NOT useful
4. Accessibility
- The visualizations should be familiar to analysts (avoid steep learning curve)
5. Scalability
- Support query and retrieval from multiple data sources
6. Integration
- Augment the analyst’s current problem-solving strategies with useful support
3/22
Existing Visualization Techniques
•
Node-link graphs
- Portall, HoNe, LinkRank
•
Treemaps
- NetVis, NFlowVis
•
Timelines and Event Plots
- An aggregate value over all events
- The patterns of individual events
•
Basic Charts
- Snorby, NVisionIP
•
Zooming, Multivariate
- NVisionIP: galaxy, small multiple, and machine views
- VisFlowConnect: global, domain, internal, and host statistics views
4/22
Data Management
•
MySQL & PHP running on a remote server
- Provide reasonable scalability
- Efficient data filtering and projection
•
No pre-defined table format
- The analyst chooses columns to visualize
- Sets table correlations and data filtering
- Flexibility and configurability
•
Only cache results of current query in memory
- Generate queries to retrieve new data on demand
•
Full SQL is available on demand
- Analysts provide visualization requirement
- System generates whole queries automatically
5/22
Web-Based Visualization
•
ARL analysts work in a browser
- Mental models & working environment
•
HTML5’s canvas element
- No external plug-ins required
- Run in any modern web browser
- Accessibility
•
Use 2D charts
- Common in other security visualization systems
- Effective for presenting values, trends, patterns
and relationships our analysts want to explore
- Accessibility
6/22
Analyst-Driven Charts
•
RGraph for basic chart visualizations
- General information visualization with 2D charts
- Only choose types of charts commonly used in network data visualization
Initialize chart properties
dest_ip
Proportion and frequency comparison (pie)
Value comparison over a secondary attribute (bar)
Trends of change of a value over time (line)
Correlation between two attributes (scatterplots)
Range related correlation (gantt)
Number of alerts
-
•
dest_ip
Assisted chart selection based on data and task (capability)
src_ip, port
•
- E.g., background grids, glyph size, color and type
•
Free to change the initial choices
src_ip
time
time
7/22
Interactive Visualization
•
Intelligent zoom
- Redraw chart to include only the selected chart elements
- Rescale the visual attributes of chart elements
•
Tooltips for value query
- Data-driven notes attached to chart elements
- Access to quantitative data on demand
•
Toolbars
- Customize glyph size, color, size
- Change chart title, size, label width, and so on
- Zooming, correlated views, spreadsheets
8/22
Correlated Views
•
A sequence of visualizations to track an ongoing investigation
- Correlate multiple data sources
- Explore data at multiple levels of details
•
Correlated charts
-
•
Select sub-regions of a chart
Filter corresponding rows
Add additional constraints, tables, attributes
Generate a following-on, correlated chart
Raw data spreadsheets
- Text-based value examination
- A conventional approach
- Working environment and mental models
9/22
Track Visualization Requests
•
Record visualization requests in each step
•
When new request is issued, list all previous requests, actions and charts
•
Improve an analyst’s “working memory” capacity
10/22
Trap Data
•
Need real world data to test the system
•
For security reasons, it is not possible to use data from ARL for testing
•
The trap server
-
•
Data from network security researchers at NCSU
Real world network traffic in Computer Science building
Transmitted to a Snort sensor to perform: (1) intrusion detection and (2) extraction of network packets
Stores two types of data: (1) NetFlow data and (2) Snort alerts
An example file for 24 hours of data
- 17.4GB of packet headers
- 938K unique source IPs, 168K unique destination IPs
- 1.6M flows with 615K alerts
11/22
Summarization of our Web-based Visualization
•
MySQL & PHP based database management
- Scalability, data filtering and projection
- No predefined table format
•
Web-based visualization & analyst driven 2D charts
- Mental model & working environment
- Avoid steep learning curve
- Select chart based on data and task
•
RGraph
Interactive Visualization
- Intelligent zoom, tooltips, toolbar
•
Correlated Views
- A sequence of visualizations
- Track an ongoing investigation
- Raw data spreadsheets
12/22
Ensemble Visualization
•
Scientific ensemble analysis & visualization
- A collection of related datasets (members), from runs of a simulation or an experiment, with
slightly varying initial conditions or parameters
- Focus on scalability (data attribute, data element, member)
- Relationships between members (comparison, aggregation, pattern mining)
•
Apply to network security data
- Scalability is also critical
- Relationships between network traffics
- Opportunity to apply ongoing research from ensembles to network security domain
•
How is a network security dataset an ensemble?
- E.g., NetFlow ensemble (member: a NetFlow)
- Distributions of alerts within and between NetFlows
•
Are ensemble techniques useful in network security domain?
- Determine the value added of this analysis
13/22
Two Stages of Ensemble Analysis
1. Structure the members into sets based on their similarities
- Level of detail clustering
- Visualize the cluster hierarchy as a tree
- Analysts choose members to visualize from the cluster tree (configurability)
2. Visualizing member sets
- Use chart visualizations
- Working environment, accessibility
14/22
NetFlow Similarity Measurement
1. Time duration
2. Density of alerts
3. Distributions of alerts
4. Types of alerts within NetFlow
• Analysts decide
- Which factors to measure
- Weights of each factor
- Configurability
……
46 secs 1 alert
46 secs 7 alerts
46 secs 7 alerts
15/22
NetFlow Cluster Tree
•
Clustering at varying threshold of similarity
•
Analysts choose tree nodes to visualize
Trade off: similarity vs. number of members
16/22
NetFlows Ensemble – 123 Members
•
Analysts define members to form an ensemble
17/22
A Cluster of NetFlows
Currently all NetFlows are visualized individually in a gantt chart
•
Developing methods to aggregate NetFlows into a composite visualization
source IP, port
•
time
18/22
Feedbacks for Further Adjustment
• Ensemble analysis and visualization is flexible
- Techniques vary based on requirements of applications
•
•
•
•
•
Different perspectives to define a network ensemble (member)?
Useful ways to measure correlations between ensemble members?
Useful ways to structure ensemble members?
Special requirements for the composite visualization?
Other recommendations?
19/22
Future Work
•
Analysis Sandbox
- Individual analyses can be performed, stored, reviewed and compared
- Improve an analyst’s “working memory” capacity
•
Analysis Preferences
- Track an analyst’s actions to better anticipate their strategies for specific types of tasks
- Use preference elicitation algorithms to track an analyst’s interest within a visualization session
•
Real-world Integration
- Not allowed to speak directly with the analysts
- Coordinate with IT staffs who support the analysts
•
Ensemble Visualization
- Further adjust existing techniques to meet the requirements in network security domain
- Integrate into the web-based network security visualization tool
20/22
Progress Summary
• Papers
- Flexible Web Visualization for Alert-Based Network Security Analytics. Hao,
Healey, and Hutchinson. In Proceedings VizSec 2013 (Atlanta, GA), 2013.
• Students supported
- Lihua Hao, PhD candidate, NC State University
• Projects supported
- Web-based visualization for network security analytics
- Ensemble visualization for network security analytics
21/22
FY 2014 Research Plan
• Validation of web-based tool with ARL collaborators
- Finalize web-based visualization tool
- Present tool to ARL IT staff
- Integrate feedback into tool’s design, iterate on requested changes and improvements
• Investigation of scalability support through ensemble visualization
- Confirm interest in pursuing scalability support
- Integrate ensemble visualization research into web-based visualization tool
- Update visualizations to support intelligent summarization and aggregation
22/22