70% of Information Leaks are Internal
Download
Report
Transcript 70% of Information Leaks are Internal
Academic Advisor: Dr. Yuval Elovici
Technical Advisor: Dr. Lidror Troyansky
• PortAuthority Offers Businesses the Opportunity to Gain Insight Into
Their Information Leak Vulnerabilities.
• 70% of Information Leaks are Internal
Most organizations focus on preventing outside-in security
breaches, but industry analysts argue that up to 70% of security
breaches occur from the inside-out. Information leaks of private and
confidential information create a growing threat to any size
organization.
• Example of file sharing information leaks:
http://www.ynet.co.il/articles/0,7340,L-2875208,00.html
Air force officer in the IDF suspended over sharing confidential army
documents…
• P2P Networks.
– Gnutella, Gnutella2, Bittorrent, eDonkey2000,
Kadmelia.
– P2P networks are typically used for connecting
nodes via largely ad hoc connections.
– Sharing content files containing audio, video,
data or anything in digital format is very common
(including confidential information).
– Real-time data, such as VOIP, is also passed
using P2P technology.
Continued…
Computer A:
Sharing non-confidential files
Laptop B:
Containing an organization
confidential file
Router
PDA C:
Searches and downloads
organizations confidential file
Router
Gnutella network
P2P Inspector
Gadget
Router
Router
Organization Firewall
Client Organization
• Develop a system which will:
– Be able to configure the scanning parameters.
– Scan the P2P networks.
– Download files suspicious as confidential.
– Analyze the material using Machine Learning.
– Generate reports.
– Produce statistics.
Inspector Gadget Database
Application Borders
Analyzing
Information
P2P Network
Discovers Confidential Files
File
Analyzer
Find and download suspected files
P2P Scanner Client
• Scanning and looking for suspicious target
(e.g. as confidential) information in the
P2P network (Gnutella).
Continued…
• Downloading the suspicious target (e.g. as
confidential) information from the P2P
network (Gnutella).
Continued…
• Analyzing the scanned results (determine
the value of the documents).
– The system will use the Learning Machine
based on the filtering algorithm to classify the
documents.
• Bayesian filtering is the process of using Bayesian
statistical method to classify documents into categories.
• Bayesian filtering gained attention when it was described
in the paper A Plan for Spam by Paul Graham, and has
become a popular mechanism to distinguish illegitimate
spam email from legitimate "ham" email.
• Bayesian filtering take advantage of Bayes' theorem,
says that the probability that a document is of a certain
group (confidential documents), given that it has certain
words in it, is equal to the probability of finding those
certain words in a document from that group (confidential
documents), times the probability that any document is
of that group (confidential documents), divided by the
probability of finding those words in any Group:
Continued…
• Statistics Gathering:
– The number of users which currently hold the target
information.
– Using IP Geolocation and finding out the geographic
location of the leaked information.
– The history of searched for, downloaded & analyzed
files.
2. Disconnect from Network
1. Start System
3. Connect to the network
4. Shutdown system
5. Scan network
User
6. Analyze downloaded files
7.Update system
parameters.
8. View statistics
Continued…
Scan network - Use Case Diagram
User
System
1: start scan
2: Scan the network
3: Download results to disk
4: end of scan
5: start Use case 6
Continued…
Analyze downloaded files - Use Case Diagram
System
1: Convert Files on disk to text format
2: Scan files using "smart" algorithm
3: Save results to statistics database
Continued…
• Performance constraints:
– The system should return a search result
for suspicious target after no more than 15
minutes.
– The system timeout for downloading
should be configurable.
– The system should hold history result and
statistics of not more than one year ago.
Continued…
• Safety and Security:
– The system will not be used for any other
purpose than find information leaks in P2P
networks (e.g. to find shared MP3 files).
– The system will not expose the
confidential documents it downloads and
the documents that were used in the
Machine Learning algorithm.
Continued…
– Platform constraints:
• OS: Windows XP.
• Database: MS SQL Server 2000.
– Programming languages (Restricted to
Python, Java/J2E, C++ and C#)
Start
• Mainly a research project.
– Algorithm risk (Machine
Learning).
– Is it good for confidential
documents?
• Action to be taken:
– Feasibility Study.
Add more
functionality
End
Feasibility Study
Is
Successful?
Try another
algorithm
What does successful mean?
• Gnutella is an old network.
– May not contain confidential information.
– Action to be taken:
• Test suite.
• Use a different P2P network.
Epilogue
• אלוביץ'" :חוזק האבטחה של חברה הוא בחוליה
החלשה שלה"...
• כנסו בהמוניכם לאתר:
– www.cs.bgu.ac.il/~amirf/AMOS