A study of Android application security - CSE - USF

Download Report

Transcript A study of Android application security - CSE - USF

A Study of Android
Application Security
William Enck, Damien Octeau, Patrick McDaniel, Swarat Chaudhuri
Systems and Internet Infrastructure Security Laboratory
The Pennsylvania State University
SAKWANNUENG TRAKOOLSHOKE -SATIAN
UNIVERSITY OF SOUTH FLORIDA
9 NOVEMBER 2015
Outline
Introduction
◦ Understand smartphone application security
Background
◦ What is Android? What are the Android system components?
The ded decompiler
◦ How does it work?
Evaluating Android Security
◦ Focuses of analysis
Application Analysis Results
◦ Program analysis results
Limitations and Conclusion
◦ The research’s limitations and observation of results
Introduction
A Study of Android Application Security
◦
◦
◦
◦
◦
◦
Seeks to better understand smartphone application security
Studies 1,100 popular free Android applications
Introduce the ded compiler
Analyze 21 millions lines of recovered code
Uncover pervasive use/misuse of personal/phone identifiers
Deep penetration of advertising and analytic networks
Introduction
Enormous security challenges
◦ Rapidly developed and deployed applications, coarse permission systems, privacy invading behaviors,
malware, and limited security models led to exploitable phones and applications
Lack of common definition for security and the volume of applications
◦ Malicious, questionable, and vulnerable applications will find their way to the market
This paper broadly characterize the security of applications in the Android Market
◦ Design and implement a Dalvik decompiler, ded
◦ Analyze codes using automated tests and manual inspection
◦ Identify the root causes of any discovered vulnerabilities
Background
Android
◦ An OS designed for smartphones
◦ Provide a sandboxed application execution
environment
◦ Customized embedded Linux system
interacts with the phone hardware
◦ The binder middleware and application
API runs on top of Linux
◦ Application’s only interface to the phone is
through these APIs
Background
Android system architecture
◦ Each execution is within a Dalvik Virtual
Machine (DVM) and under a unique UNIX
uid
◦ Application interact with each other and
the phone through IPC
◦ Intents are types inter process messages
directed to particular applications or
systems services
Background
Android system architecture
◦ Persistent content provider data stores are
queried through SQL-like interfaces
◦ Background services provide RPC and
callback interfaces trigger actions or
access data
◦ UI activities receive named action signals
from the system and other applications
◦ Access to system resource, data, and IPC is
governed by permissions assigned at
install time
Background
Android system architecture
◦ The permissions are defined in its
manifest file
◦ An application is allowed to access a
resource or interface if the required
permission allows it
◦ The user is presented a screen listing the
permissions requirement of an application
Background
Dalvik Virtual Machine (DVM)
◦ Android applications are written in
Java, but run in the DVM
◦ DVM and Java bytecode run-time
environments differ substantially
◦ Java applications are composed of
one or more .class files, one file per
class
◦ JVM loads the bytecode for a Java
class from the associated .class file as
it is referenced at run time
◦ Dalvik application consists of a single
.dex file containing all application
classes
Background
Dalvik Virtual Machine (DVM)
◦ After the java compiler creates JVM bytecode, the Dalvik dx compiler consumes the .class files,
recompiles them to Dalvik bytecode and write the resulting application into a single .dex file.
◦ This process consists of the translation, reconstruction, and interpretation of three basic elements of
the application
◦ A constant pool describes the constants used by a class. Includes among other items, references to other classes, method names,
and numerical constants
◦ Class definition consists in the basic information e.g. access flags and class names
◦ Data element contains the method code executed by the target VM, as well as other information related to methods and to class
and instance variables
Background
Register Architecture
Instruction set
DVM
JVM
Register Based
Stack based
Assigns local variables to any
of the 2^16 available registers.
Directly manipulate registers
Assigns local variables to a
local variable table and push
them onto an operand stack
218 opcodes
200 opcodes
Include the source and
destination registers
Tens of opcodes dedicated to
moving elements between the
stack and local variable table
Dalvik has average 30% fewer instructions than
Java, but have 35% larger code size
Background
DVM
JVM
Constant pool structure
A single pool that all classes
simultaneously reference.
Inlining their values into the
bytecode
Replicate elements in the
constant pools within the
multiple .class files
Control flow structure
Java bytecode structure
loosely mirrors the source
code
Does not
Ambiguous primitive types
Use the same opcodes for
integers and floats
Distinguish between
int/float/long/double
Background
Null references
Comparison of object
references
Storage of primitive types in
arrays
DVM
JVM
Use a zero value constant
as null
Has a null value
Compares between two
integers and compares
between an integer and
zero
Uses typed opcodes for the
comparison of object
references and for null
comparison of object
Uses ambiguous opcodes to Unambiguous. The array type
store and retrieve elements must be recovered for correct
in arrays of primitive types
translation
The ded decompiler
Building a decompiler from DEX to Java, proved to be surprisingly challenging
◦ Java decompilation has been studied since the 1990s
◦ Prior to our work, there existed no functional tools for the Dalvik bytecode. The vast difference
between JVM and DVM makes simple modification of existing decompilers was not possible
◦ This choice to decompile the Java source rather than operate on the DEX opcodes directly
◦ Leverage existing tools for code analysis.
◦ Required access to source code to identify false-positives resulting from automated code analysis
◦ Perform manual confirmation
The decompiler is freely available at http://siis.cse.psu.edu/ded
The ded decompiler
ded extraction occurs in three stages:
◦ Retargeting
◦ Optimization
◦ Decompilation
The ded decompiler
Application retargeting
◦ Recovering typing information
◦ Translating the constant pool
◦ Retargeting the bytecode
Type Inference
◦ Identify class and method constants/variables
◦ Only know variable width 32/64 bits, not type
◦ Does not distinguish integer and object reference
comparison
◦ Determine unknown types by how they are used in
operations with know type operands
The ded decompiler
Type Inference
◦ ded adopts the approach
◦ Dalvik bytecode reuses registers that are no longer in
scope
◦ 3 ways ded infers a register’s type
◦ Comparing with known type
◦ Types associated with instructions
◦ Passing register to methods / return value expose the type via
method signature
The ded decompiler
Ded type inference algorithm
◦ Identify ambiguous register declaration
◦ Each branch of control flow is pushed onto an inference
stack
◦ When branch is abandoned, the next branch is popped
of the stack, continue searching
◦ Type information is forward propagated, modulo
register reassignment, through the control flow graph
The ded decompiler
Constant pool conversion
◦
◦
◦
◦
Dalvik maintains a single constant pool for the application
Java maintains one for each class
Dalvik bytecode places primitive type constants directly in the byte code
Java bytecode uses the constant pool for most references. The conversion of constant pool
information is performed by:
◦ Identify which constants are needed for a .class file.
◦ Once ded identifies the constant required by a class, it adds them to the target .class file
◦ For primitive type constants, new entries are created
◦ For class, method, and instance variable references, the created Java constant pool entries are based on the Dalvik constant
pool entries.
The ded decompiler
Method Code Retargeting
◦
◦
◦
◦
Preprocess the bytecode to reorganize structures that cannot be directly retargeted
Linearly traverse the DVM bytecode and translate to the JVM
ded reorders and annotates the bytecode with array size and type information for translation
Bytecode translation linearly processes each Dalvik instruction
◦ maps each referenced register to a Java local variable table index
◦ performs an instruction translation for each encountered Dalvik instruction.
◦ patches the relative offsets used for branches based on preprocessing annotations
◦ defines exception tables that describe try/catch/finally blocks
◦ The resulting translated code is combined with the constant pool to creates a legal Java .class file
The ded decompiler
Optimization and Decompilation
◦
◦
◦
◦
◦
The retargeted .class file can be decompiled using Fernflower or Soot
ded’s bytecode yields unoptimized Java code
Decompiled code is complex and hard to analyze
We use Soot to optimize
Soot is an optimization tool with the ability to recover source code
The ded decompiler
Source Code Recovery Validation
◦ The recovered code was virtually indistinguishable
from the original source
◦ We recover the source code for the top 50 free
applications from each of the 22 applications
categories – 1,100 in total
◦ Obtained September 1, 2010 – took 497.7 hours or
20.7 days
The ded decompiler
Categories of failure
◦ Retargeting failures (0.59%)
◦ Unresolved reference
◦ Type violations
◦ Illegal bytecode
◦ Decompilation failures
◦ Decompilation limitation
Evaluating Android Security
Focus of analysis:
◦ Exploring issues uncovered in previous studies and malware advisories
◦ Searching for general coding security failures
◦ exploring misuse/security failures in the use of Android framework
Four approaches to evaluate recovered source code:
◦
◦
◦
◦
Control flow analysis
Data flow analysis
Structural analysis
Semantic analysis
Evaluating Android Security
Control flow analysis
◦ Imposes constraints on the sequences of
actions executed by input program P,
classifying some of them as errors
◦ A control flow rule is an automaton A whose
input words are sequences of actions of P
◦ An erroneous actions sequence is one that
drives A into a predefined error state
◦ To statically detect violations specified by A,
the program analysis traces each control flow
path in the tool’s model of P
Evaluating Android Security
Data flow analysis
◦ Permits the declarative specification of problematic data flows in the input program
◦ Android phone contains several pieces of private information that should never leave the phone:
user’s phone number, IMEI (device id), IMSI (subscriber id), ICC-ID (sim card serial number)
◦ We check that this information is not leaked to the network
◦ The specification declaratively labels program statements matching certain syntactic patterns as data
flow sources and sinks
◦ Data flow between the source and sinks are violations
Evaluating Android Security
Structural analysis
◦ Allows for declarative pattern matching on the abstract syntax of the input source code
◦ Not concerned with program executions or data flow
Evaluating Android Security
Semantic analysis
◦ Allows the specification of a limited set of constraints on the values used by the input program
◦ The analyzer detects violations to this property using constant propagation techniques well known in
program analysis literature
Evaluating Android Security
Analysis overview
◦ Covers both dangerous functionality and vulnerabilities
◦ Selecting properties for study was a significant challenge
Properties
Specifications
Misuse of Phone Identifiers
Phone identifiers leaking to remote
network servers. Identify data flows
Exposure of Physical Location
Location exposed to advertisement
servers. Identify the portion of code
Abuse of Telephony Services
Malware sent SMS to premium numbers.
Identify hard-coded phone numbers.
Eavesdropping on Audio/Video
Audio/video eavesdropping. Identify
control flows to UI
Evaluating Android Security
Properties
Specifications
Botnet Characteristics (Sockets)
Non-HTTP ports and protocols. Examine
Socket use for suspicious behavior.
Harvesting Installed Applications
List of installed applications. Survey use
to APIs.
Use of Advertisement Libraries
Information exposure to ad and analytic
networks. Survey inclusion of ad and
analytic libraries.
Dangerous Developer Libraries
Dangerous functionality in applications.
Report replication and the implications.
Android-specific Vulnerabilities
Search for non-secure coding practice.
General Java Application Vulnerabilities
Java application vulnerabilities. Misuse of
information and methods.
Application Analysis Results
Information Misuse
◦ Explore how sensitive information is being leaked through information sinks OutputStream object
from URLConnections, HTTP GET, and POST parameters in HttpClient connections, and the string used
for URL objects
Application Analysis Results
Phone Identifiers
◦ Frequently leaked through plaintext requests
◦ Used as device fingerprints
◦ Property to a remote server
◦ IMEI are used to track individual users
◦ IMEI is tied to personally identifiable
information (PII)
◦ Not all phone identifiers use leads to
exfiltration
◦ Are sent to advertisement and analytic
servers
Application Analysis Results
Location Information
◦ The granularity of location reporting may not be
obvious to the user
◦ Sent to advertisement servers
Application Analysis Results
Phone Misuse
◦ Explore the misuse of smartphone interfaces
Application Analysis Results
Telephony Services
◦ Applications do not use fixed phone number services
◦ Applications do not misuse voice services
Background Audio/Video
◦ Applications do not misuse video recording
◦ Applications do not misuse audio recording
Application Analysis Results
Socket API Use – external server
◦ A few applications include code that uses the Socket class directly
◦ No evidence of malicious behavior by applications using Socket directly
Installed Applications
◦ Do not harvest information about which applications are installed on the phone
Application Analysis Results
Included Libraries
◦ Libraries included by applications are easy to identify due to namespace conventions
Application Analysis Results
Advertisement and Analytic Libraries
◦ Use of phone identifiers and location is
sometimes configurable
◦ Reporting frequency is often configurable
◦ Probe for permissions
Application Analysis Results
Developer Toolkits
◦ Many applications use developer toolkits containing common sets of utilities identifiable through
class name or library path
◦ Replicate dangerous functionality
◦ Probe for permissions
◦ Well known brand commission developers to include dangerous functionality
Application Analysis Results
Android-specific Vulnerabilities
◦ Technical report of Android-specific vulnerabilities
Application Analysis Results
Leaking Information to Logs
◦ Private information is written to Android’s general
logging interface
Leaking Information to PC
◦ Application broadcast private information in IPC
accessible to all applications
Unprotected Broadcast Receivers
◦ Some applications are vulnerable to forging attacks
to dynamic broadcast receivers
Application Analysis Results
Intent Injection Attacks
◦ Some applications define intent addresses base on IPC input
Delegating Control
◦ Few applications unsafely delegate actions
Null Checks on IPC Input
◦ Applications frequently do not perform null checks on IPC input
Sdcard Use
◦ Unexpected uses of data read/write
JNI Use
◦ Java Native Interface, not written in Java – have inherent dangers
Limitations
This study is limited in three ways
◦ The studied applications were selected with a bias towards popularity
◦ The program analysis tool cannot compute data and control flows for IPC between components
◦ Source code recovery failures interrupt data and control flows
Conclusion
ded and the program analysis specifications open a new door for application certification
◦ Potential to integrate these tool into an application certification process
◦ Challenging logistically and technically
Concern for misuse of privacy sensitive information
◦ IMEI, IMSI, ICC-ID
◦ Malicious intent
How is it misused?
◦ used for everything from “cookie-esque” tracking to account numbers
Conclusion
Significant penetration of ad and analytic libraries
◦ Occur in 51% of the applications studied
◦ An application could have up to 8 different libraries
Developers fail to take necessary security precautions
◦ Many developers fail to securely use Android APIs
◦ Insufficient protection of privacy sensitive information.
◦ No exploitable vulnerabilities that can lead to malicious control of the phone
We found no evidence of telephony misuse, background noise recording of audio or video, abusive
connections, or harvesting lists of installed applications
Future study: perform attacks!