Introduction to Steganalysis Schemes

Download Report

Transcript Introduction to Steganalysis Schemes

Introduction to Steganalysis
Schemes
Multimedia Security
Outline
• Steganalysis to LSB encoding
• Steganalysis based on JPEG compatibility
• Some discussions
Introduction
• Steganography
– The art of secret communication
– Stego content (e.g. images) should not
contain any easily detectable artifacts due
to message embedding
– The less information is embedded, the
smaller the probability of introducing
detectable artifacts
Watermarking vs. Steganography
Fidelity
Robustness
Steganography
Watermarking
Capacity
Steganalysis of LSB Encoding
Goal
• To inspect one or possibly more images
for statistical artifacts due to message
embedding in color images using the
LSB method
– To find out which images are likely to
contain secret messages
– To estimate the reliability of decisions
• Type I error (false-alarm) and Type II error
(Miss)
Application Scenarios
Automatic
Checking
Internet
Internet node with a special filter
Images in
Seized computer
Images sent
to a certain address
Forensics Expert
LSB Encoding
• Replacing the LSB of every gray-level of
color channel with message bits
– On average 50% of the LSB are changed
– Logic behind this scheme
• LSB in scanned or camera-taken images are
essentially random
• Encrypted (randomized) message are random
• No statistical artifacts will be introduced
Important Observation
• Number of unique colors in cover images
– Typically smaller than the number of pixels in the images
• 1:2 for high quality scans in BMP format
• 1:6 or lower for JPEG images or video
• Many true-color images have a relatively small
“palette”
• After LSB embedding, new color palette will have a
distinct feature
– Many pairs of close colors
– An evidence of LSB encoding-based steganography
Formulations
• U: number of unique colors in an image
• P: number of close color pairs
– Two colors (R1,G1,B1) and (R2,G2,B2) are
close if |R1-R2|≤1 and |G1-G2|≤1 and |B1B2|≤1
• R: ratio between the number of close
pairs of colors and all pairs of colors
– R=P/C(U, 2) , C(., .) # of combination
The Proposed Scheme
• After embedding, U will be increased to U’,
and we can evaluate the number of unique
pairs of P’.
• The value of R for an image that does not
have a message will be smaller than that
of an image that already has a message
already embedded in it
The Proposed Scheme (cont.)
• It is impossible to find a threshold of R for all
images
– Due to a large variation of U
• Observations for reliable distinguishing
– For an image already contains a large message
• Embedding another message in it does not modify R
significantly
– For an image not containing a message
• R increases significantly
– Use the relative comparison of R as the decision
criterion
Detection Algorithm
•
To find out whether or not an image has a secret
message
1.
2.
Calculate R=P/C(U, 2)
Using LSB embedding in randomly selected pixels
–
3.
4.
Size of the test message: 3‧a‧M‧N (for M by N color images)
Calculate R’=P’/C(U’,2)
Decide whether an image is embedded
–
–
R~=R’  the image already had a large message hidden
R’>R the image did not have a message in it
R’/R: the separating statistics
Limitations
• If the secret message size is too small
– the two ratio will be very close to each other
• We cannot distinguish images with and without
messages
Experiments
• Using an image database of 300 color images
– 350x250 pixels
– JPEG compressed
– Capacity for each image: 32.8k bits (350x250*3/8)
• A message of length 20KB (2/3 of maximal capacity) was
embedded into each image to form a new database of
images with messages
• The detection algorithm is run for both database and the
message presence is tested by embedding a test
message of size 1KB (a=1/30)
Experimental Results
1.1
_ : original database
… : embedded database
Parameter Optimization
•
•
Model the density functions as Gaussian
distributions
– N(μ, σ) and N(μs, σs)
Different size of secret
messages ,denoted as s, and test
messages are tested
– Secret messages: 1% to 50%
– Test messages: a=0.01 – 0.5
•
Results
– μ>μs for all s
– s decreases  N(μs, σs) become flat and
the peak moves right
– s increases  N(μs, σs) become narrower
and the peak moves left
• Easier to separate the two peaks for larger
secret message sizes
Threshold Selection
Type I Error = Type II Error
(equals minimizing overall error)
Change the threshold Th to adjust for the importance of not
missing an image with a secret message at the expense of
false-alarm
Experimental Results
K
K
K
K
Experimental Results (cont.)
K
K
Conclusions
• The probability of error prediction is mainly determined
by the size of the secret message
– The influence of the test message size is much smaller
• The optimal test message size is different for different
secret message size
• The detection algorithm mainly targets for images with
smaller number of unique colors
– The results for high-quality scanned and loselessly compressed
images (U>0.5MN) may be unreliable
Steganalysis Based on JPEG
Compatibility
Image Steganography
• Image formats
– Uncompressed (BMP)
• Offering the highest capacity and best overall security
– Palette (GIF)
• Difficult to provide security with reasonable capacity
– Lossy compressed (JPEG, JPEG 2000)
• Difficult to hide message in JPEG stream in a secure
manner while keeping the capacity practical
Goal of this Paper
• To show that images may be extremely poor
candidates for cover images if
• Initially acquired as JPEG images and later decompressed to a
loseless format
• For steganalysis methods, minimal amount of
distortion is to be achieved to reduce visible artifacts
– The act of message embedding will not erase the
characteristic structure created by JPEG compression
– Analyzing the DCT coefficients of images to recover even
the values of JPEG quantization table
• Evidence for steganography
– An image stored in loseless format that bears a strong
fingerprinting of JPEG compression, yet is not fully
compatible with JPEG compressed image
JPEG Compression
DCT
Uncompressed Image
Borig
Huffman coder
Zigzag-scan
dk(i), i=0,…,63
Dk(i)=Round (dk(i)/Q(i))
JPEG Quantization Matrix Q
JPEG Decompression
• Huffman decoding
• QDk(i)=Q(i)*Dk(i)
– Multiplying quantized DCT step with
quantization step
• Braw=DCT-1(QD )
– Inverse DCT
• B=[Braw]
– rounded to integers in the range of 0-255
Observations
• If the block B has no pixels saturated at 0
or 255
– ||Braw-B||2 ≤ 16 , ||·||: L2 norm
– Since |Braw(i) –B(i)| ≤0.5 for all i
The Proposed Scheme
• Question
– Given an arbitrary 8x8 block B of pixel values, could this block
have arisen through the process of JPEG decompression with
the quantization matrix Q (if available)?
– ||B-Braw||2
=||DCT(B)- DCT(Braw)||
=||QD’-QD||
By Parseval’s Equality
≤ 16
- Additional check ≧Σ|QD’(i)-Q(i)round(QD’(i)/Q(i)| = S
- Σ(QD’(i)-qp(i)(i))2 ≤ 16, qp(i):integer multiples of Q(i) close to QD(i)
- B=[DCT-1(QD)], where QD(i)=qp(i)(i)
Algorithm
1. Divide the images into 8x8 blocks
2. Arrange the blocks in a list, and remove all
saturated blocks from the list
•
T: number of remaining blocks
3. Extract the quantization matrix Q from all T
blocks
•
If all elements of Q are 1s, the image is not
calculated
Algorithm (cont.)
4. For each block B, calculate S
5. If S>16,
B is not compatible with JPEG compression.
else
Perform the additional check
6. After going through T blocks, if no incompatible blocks is
found, no evidence of steganography is available.
7. Repeat the algorithm for different 8x8 division for
detecting cropped images
Extracting the Quantization Matrix
Some Discussions
Reference
• J. Fridrich, R. Du and M. Long, “Steganalysis of
LSB encoding in color images, ” ICME 2000,
New York, 2000
• J. Fridrich, M. Goljan and R. Du, “Steganalysis
based on JPEG compatibility,” SPIE Multimedia
Systems and Applications IV, Denver, 2001
• G. Goth, “Steganalysis gets past the hype,’ IEEE
Distributed Systems Online, April 2005