International conference of developers
and users of free / open source software

Steganography - coding and intercepting the information from encoded pictures in the absence of any initial information

Monika Kwiatkowska, Lublin, Poland and Lukasz Swierczewski, Lomza, Poland

LVEE Winter 2014

The work includes implementation and extraction algorithms capabilities test, without any additional data (starting position, the number of bits used, gap between the amount of data encoded) information from encoded files (mostly images). The software is written using OpenMP standard [1], which allowed them to run on parallel computers. Performance tests were carried out on computers, Blue Gene/P [2], Blue Gene/Q [3] and the system consisting of four AMD Opteron 6272 [4]. Source code is available under GNU GPL v3 license and are available in a repository OLib [5].

I. Introduction

Steganography is the science of determining how to conduct communication so that the presence of the message could not be detected by third parties. It differs from cryptography in a way that in cryptography existence of the message is not negated but its content remains implicit. Steganography hides the fact that any communication has been conducted.

Differences between steganography and cryptography is shown in Table 1.


Table 1. Differences between steganography and cryptography.

II. Traditional Steganography

The use of steganography dates back to the time of Herodotus, the fifth century BC. Examples of traditional steganography can be tattooing the scalp (after the hair grew back information remains invisible). One of the best solutions of this kind applied by the Germans during World War II – microdots technique. It was based on minimazing pictures to such scale so that you can paste them into the text as a dot.

III. Digital Steganography

With the development of digital technology, steganography has found a new use also in the field of science. Digital steganography bases on making subtle changes to the original medium, and therefore gives a much more possibilities than traditional steganography.

The carrier of concealed information can be virtually any file (one that can be modified without having to worry about damage to its internal structures). However, the most commonly used are multimedia files – these are relatively large in size and difficult to capture the modification of original file.

Digital steganography depending on the type of operation can be divided into the following categories:

  • Substitutional
  • Transformational
  • Spectrum modification
  • Spectrum spread
  • Distortional
  • Statistical
  • Carrier generation

Another field of use for steganography is to communicate using VoIP technology. The Polish jargon adopted using in this case the word „steganphony”. The first such solution had been proposed by two Polish scientists Wojciech Mazurczyk and Krzysztof Szczypiorski in 2008 at a conference in Mexico 11. It was based on in transferring the hidden content in the delayed packets, which according to a standard communication protocol (operating in real time) are omitted.

IV. LSB – Least Significant Bit – one of the methods of substitution

The article will summarize often used method of substitution – LSB. Mostly it uses the least significant bits to record information. These bits often carry only noise and are insignificant from the point of view images for example. The principle of LSB operation for one and two least significant bits are shown in Figure 1 and Figure 2.


Figure 1. Applying the LSB using one least significant bit.


Figure 2. Application of the LSB method using the two least significant bits.

V. Implementation

Source code had been implemented in pure ANSI C. Sample code for function encrypting text using least significant bits of the image you can see in Listing 1.


int crypt_space_1bit(char buffer, char *open_text, long int start_position, long int open_text_size, long int space_size)
{
long int i;
for(i=0; i < (open_text_size * 8); i++)
{
if( check_bit(open_text[i/8], i%8) )
{
set_bit(buffer[start_position+i
space_size], 0);
}
else
{
clear_bit(buffer[start_position+i*space_size], 0);
}
}
}
Listing 1 The function encrypting the data in the image using one least significant bit.

We are passing pointers to two array to the function – the buffer array (image data) and open_text array (array of information to be encoded). In addition, the code uses macros check_bit, set_bit and clear_bit. Their definition is shown in Listing 2.


# define check_bit(var,pos) ((var) & (1LL<<(pos)))
# define set_bit(var,pos) ((var) |= (1LL<<(pos)))
# define clear_bit(var,pos) ((var) &= ~(1LL<<(pos)))
Listing 2. Macros defining operations on bits.

The most important, in the case of this article, however, is a function used to extract encrypted text without knowledge of the initial starting value, or even the length of ciphertext. It has been shown in Listing 3.


long int breaker_standard_1bit_unknown_size_openmp(char buffer, long int buffer_size, long int size_down, long int size_upper, result_structure *result, int number_of_threads)
{
char *open_text;

long int start_position;

long int index_result;
index_result = 0;

long int i;
long int j;
j = size_down;

#pragma omp parallel for shared(buffer, buffer_size, size_down, size_upper, result, index_result) private(i, j, start_position, open_text) num_threads(number_of_threads)
for(i=0; i < (buffer_size – j); i++)
{
open_text = (char
) malloc( (size_upper+1) * sizeof(char));

start_position = i;

for(j=size_down; j <= size_upper; j++)
{
encrypt_standard_1bit(buffer, open_text, start_position, j);

if ( ascii_veryfi(open_text, j) == 1 )
{
open_text[j] = ‘\0’;

#pragma omp critical
{
result[index_result].start_position = i;
strcpy(result[index_result].text, open_text);
index_result++;
}
}
}

free(open_text);
}

return index_result;
}
Listing 3 Function to extract information from the least significant bits without the initial information.

This function searches the least significant bits in the buffer array of size of buffer_size and searches all the possible encrypted patterns of lengths from size_down to size_upper. The result is inserted into the result array. The code uses so-called pragma derived from the OpenMP standard. This pragma aims to parallelize the main loop. Also worth mentioning is separate critical section – it takes care of the correct addition of blocks to the results array. Only one thread can access the critical section at a time. Also a function has been used:

int ascii_verify(pointer, size);

which verifies whether the starting substring at the pointer of length size is the correct text stored using ASCII code.

VI. Results

Performance results obtained when searching for encoded text inside graphic file with a resolution of 1600×1200 is shown in Table 2. A similar table for resolution 4096×4096 is presented in Table 3. Additionally, obtained through parallel programming (OpenMP), speedup on IBM Blue Gene/Q and AMD Opteron 6272 is shown in Figure 3.


Table 2. Performance results obtained during scan of the image with a resolution of 1600×1200 (searching one least significant bit using length of the search phrase in the range of 10 to 25).


Figure 3. The speedup obtained on platforms AMD Opteron 6272 and IBM Blue Gene/Q while searching an image with a resolution of 1600×1200 (search only one least significant bit of the length of the text to search in the range from 10 to 25).


Table 3. Performance results obtained during scan of the image with a resolution of 4096×4096 (searching one least significant bit using length of the search phrase in the range of 10 to 25).

In the case of a platform consisting of four AMD Opteron 6272 maximum speedup is achieved by using 64 CPU and it reached 36.00. Using computational units used in Blue Gene/Q the speedup reached to 31.915. But it was not obtained, as might be expected, with a maximum (64) number of processors, but only 48. This may be due to the fact that one Blue Gene/Q CPU has only 16 physical cores that implement the execution of 64 threads.
It can also be noted that increasing the resolution of the analyzed image from 1600×1200 to 4096×4096 did not increase the runtime of the algorithm even tenfold – for applications executed sequentially on the Blue Gene/Q time increased from 193.11 seconds to 1309.42 seconds (about 6.78 times) and for the Blue Gene/P changed from 560.47 seconds to 4769.00 (about 8.50 times). Analysis has been performed for the speedup depending on the size of the hidden string searched. The results are shown in Table 3 (for AMD Opteron 6272) and in Table 4 (for Blue Gene/Q). Presentation of speedup obtained is presented in Figure 4. Additionally, Figure 5 shows the difference in the runtime of the sequential program execution between two platforms (AMD Opteron 6272 – IBM Blue Gene/Q).


Table 4. The results of the analysis of performance for different sizes of searched hidden string [Down_Size; Upper_Size] (the ranges of [10, 15] to [10,60]) and platform based on AMD Opteron 6272.


Table 5. The results of the analysis of performance for different sizes of searched hidden string [Down_Size; Upper_Size] (the ranges of [10, 15] to [10,60]) and platform based on processors installed in the Blue Gene/Q.


Figure 4. Graph showing the resulting speedup on AMD Opteron 6272 and IBM Blue Gene/Q in the analysis of performance for different sizes of searched hidden string.


Figure 5. Graph showing the time difference obtained on a platform consisting of the AMD Opteron 6272 and obtained on the Blue Gene/Q in the analysis of performance for different sizes of searched hidden string.

The strangest fact can be seen in the analysis of speedup presented in Table 5 According to the measurements on 64 processors for searching string of a length of 10 to 60 characters, maximum achieved speedup of 70.319, which is impossible and must be the result of the time measurment error. Problematic data in the table indicated in red. This anomaly is no longer present on the computer with four AMD Opteron 6272. However, in this case, the maximum speedup was only 42.303 and was obtained while searching a string with the length in the range [10, 60].

While in terms of speedup, the platform for the Blue Gene/Q is better but the lower sequential execution runtime of the program is always achieved on a computational node with AMD Opteron 6272 – it is confirmed in Figure 5. Differences in the runtime between these two systems, however, are relatively small and do not exceed 2%.

VII. Conclusions and capabilities of work development

Using the abilities of parallel programming, one can very effectively deal with data processing in the field of steganography. LSB technique is used in most commercial software.

The work can be treated as a base for discussion, because it is very difficult to find a practical application of implemented steganography methods in the modern world. In the so-called „pure steganography” strenght of encryption is mainly based on lack of knowledge about the techniques used by the individual encrypting the information 7. So you can not publish this type of algorithms as Open Source. Such an approach, however, does not meet Kerckhoffs principle 10 stating, that a cryptographic system should be secured even if all the details about how it works (except the key) are known, and it is not recommended .

The presented work results concern the simplest variant in which the string is interpreted as ASCII code. Of course you can come up with your own, much more complicated system for the representation of characters.

!{width:590px}href="http://lvee.org/uploads/abstract_file/file/120/11.png!
Figure 6. Logo of the conference LVEE with hidden text „LVEE – the best conference”.

Acknowledgment

Interdisciplinary Centre for Mathematical and Computational Modeling (ICM), Warsaw University, Poland is acknowledged for providing the computer facilities under the Grant No. G55-11.

Note

The thesis presented on Winter LVEE 2014 Conference in Minsk (Belarus).

References

  1. Dagum, Leonardo, and Ramesh Menon. “OpenMP: an industry standard API for shared-memory programming.” Computational Science & Engineering, IEEE 5.1 (1998): 46-55.
  2. Lin, Heshan, et al. “Massively parallel genomic sequence search on the Blue Gene/P architecture.” High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for. IEEE, 2008.
  3. Haring, Ruud A., et al. “The ibm blue gene/q compute chip.” Micro, IEEE 32.2 (2012): 48-60.
  4. Keltcher, Chetana N., et al. “The AMD Opteron processor for multiprocessor servers.” Micro, IEEE 23.2 (2003): 66-76.
  5. OLib Library; http://lib.oproject.info
  6. Katzenbeisser S.: Information Hiding Techniques for Steganography and Digital Watermarking, s. 65 – 105, 1999, Artech House
  7. Katzenbeisser S.: Information Hiding Techniques for Steganography and Digital Watermarking, s. 20 – 23, 1999, Artech House
  8. Katzenbeisser, Stefan, and Fabien AP Petitcolas. Information hiding techniques for steganography and digital watermarking. Vol. 316. Norwood: Artech house, 2000.
  9. Kozieł, G. “Przezroczystość danych ukrytych w sygnale audio.” Pomiary, Automatyka, Kontrola 58 (2012): 972-974.
  10. Auguste Kerckhoffs, “La cryptographie militaire” Journal des sciences militaires, vol. IX, pp. 5–83, January 1883, pp. 161–191, February 1883.
  11. Mazurczyk, Wojciech, and Krzysztof Szczypiorski. “Steganography of VoIP streams.” On the Move to Meaningful Internet Systems: OTM 2008. Springer Berlin Heidelberg, 2008. 1001-1018.

Abstract licensed under Creative Commons Attribution-ShareAlike 3.0 license

Back