Embedding and Extracting Malicious Data from a JPEG using Steganography

So, What is Steganography?

The word steganography is derived from the Greek words steganos (meaning hidden or covered) and the Greek root graph (meaning to write). Steganography is a technique known for hiding secret data within an ordinary, non-secret, file or message in order to avoid detection; the secret data is then either extracted or executed at its destination. The use of steganography can be combined with encryption as an extra step for hiding or protecting such data.

Cyber criminals use this type of technique as a cyber attack. They usually hide malicious code within a “normal” image file. A regular JPEG image contains several megabytes of pixel data, allowing any attacker to alter several of its pixels and embed malicious code. Attackers can insert data over that original image altering the bits and bytes in a way that it’s almost impossible for the human eye to detect and its usually very time consuming for machines to scan every image for hidden data, especially when the actual threat is unknown, making it difficult for machines to look for something unspecified. Image steganography can be used to hide the payload within the code itself or simply embed code that can call additional code or executables provoking cyber attack.

One of the reasons steganography is not being used as much as other techniques, it is because these threats are limited to a specific delivery mechanism, it can’t be used to deliver malware en-masse, making it impossible for cyber criminals to reach a broad number of users, however, this technique is not obsolete. Back in 2018 FortiGuard Labs researchers observed an increase in malware being delivered using steganography to hide malicious payloads in memes shared online. This type of embedded malware started by attempting to contact a command-and-control (C2) host that downloaded additional code or commands associated with known attacks. This malware was not receiving commands directly, instead, it was instructed to look for additional images in the associated Twitter feed, downloading those images, and then extracting commands hidden within those images to propagate its malicious activity.

Although steganography is an uncommon technique used by cyber criminals, they’ve managed to utilize this technique several times, mainly due to to the rapid spread of content through social media, to deliver malicious payloads.

Steganography Process:

How it works?

A standard image has bits and bytes. Attackers can embed data over that original image altering the bits and bytes in a way that it’s almost impossible for a human eye to detect. It can be hidden messages, text or even malware.

How to embed malicious code in such files?

For the purpose of this tutorial we are using Kali Linux and StegHide.

Step 1: Installing Steghide in KaliLinux

//
sudo apt-get install steghide
//

Step 2: Gather image and .txt files and embed code

Get a JPG image and a .txt file with any content inside.
- I used this txt and jpg file
  - random-text
Duplicate your jpg image so we can compare the original with the modified one.
Move the images and the txt file to your desktop.
Go to desktop or the location where you put both image and txt file in your terminal.
Run the command:

//
sudo steghide embed -ef nameofyourtxtfile.txt -cf nameofyourimagemodified.jpg
//

INFO: nameofyourfile.txt (random-text.txt) nameofyourimagemodified.jpg (random-steg.jpg)

Passphrase: It’s going to ask for a passphrase, if you want to encrypt the content of your .txt file add one. If you need to execute the malicious code, leave it empty.

Now we can check that both files are different in size.

Step 3: Check changed hash on WindowsPowerShell

Open PowerShell on Windows
Go to the location where you have both files
Run

//
Get-FileHash random-steg.jpg
//

//
Get-FileHash random-original.jpg
//

Compare both hashes.

Step 3: How to extract the embedded data from the image

Run the following command

//
steghide extract -sf random-steg.jpg -xf random-text-extraction.txt
//
ls
//

New txt file: random-text-extraction.txt is the name of the file where the extracted data will be.

Check the new txt file created

Now, let’s check what data is in each .txt file. Both files should have the same content.

//
sudo nano random-text.txt
//
sudo nano random-text-extraction.txt
//

With this technique it is possible to embed any type of file inside an image. You will just have to replace the .txt file for an .exe or any other code.

The following paragraph explains how Steghide works internally, and is taken from the Steghide manual page:

Steghide uses a graph-theoretic approach to Steganography. You do not need to know anything about graph theory to use Steghide and you can safely skip the rest of this paragraph if you are not interested in the technical details. The embedding algorithm roughly works as follows: at first, the secret data is compressed and encrypted. Then a sequence of positions of pixels in the cover file is created based on a pseudo-random number generator initialized with the passphrase (the secret data will be embedded in the pixels at these positions). Of these positions those that do not need to be changed (because they already contain the correct value by chance) are sorted out. Then a graph-theoretic matching algorithm finds pairs of positions such that exchanging their values has the effect of embedding the corresponding part of the secret data. If the algorithm cannot find any more such pairs all exchanges are actually performed. The pixels at the remaining positions (the positions that are not part of such a pair) are also modified to contain the embedded data (but this is done by overwriting them, not by exchanging them with other pixels). The fact that (most of) the embedding is done by exchanging pixel values implies that the first-order statistics (i.e. the number of times a color occurs in the picture) is not changed. For audio files the algorithm is the same, except that audio samples are used instead of pixels.