mht.wtf

A blog about computer science, programming, and whatnot.

Writing a JPEG decoder in Rust - Part 2: Implementation I

August 19, 2016 back to posts

This is a blog series. Read part 1 here

Last time we got a basic understanding of the different steps in decoding a JPEG image, as well as how the file is structured. What we did not get was any code or hints at implementation. Finally we get to see an attempt of writing a JPEG decoder, in Rust.

Additionally, I have now open sourced the project on GitHub. If you are interested in seeing the full thing, or want to know what I will try to cover in Part 3 of this series, take a look! Again, feedback is very welcome, be it typos, questions, or suggestions for improvements. As I mentioned in Part 1, the project is very much ongoing, and I have cut quite a few corners here and there. If you are testing the decoder, and find an image that is not decoded properly, I will be very interested in hearing from you1!

At last, this project is purely educational, for me and hopefully also for you. I do not actually need a new JPEG decoder, and chances are you do not either :)

Implementation

Now that we have a high level understanding of how JPEG works, we can write a simple decoder for a test image of Lena2:

lena

Our Program

For testing and validation purposes, we will create a binary program, which we will run with

cargo run <input.jpeg> <output.ppm>

The output file is a ppm file, which was the simplest way I found to see the image data we decode. The format is very simple: see here how it works. Common image viewers should open ppm files. Personally I have used eog.

JFIF

What does the JFIF specification (pdf) enforce? The file has to start with the SOI (Start of Image) marker, followed by the APP0 (Application Segment 0) marker3. According to the JPEG spec, there are 16 application markers, 0xffe0-0xffef, which are “reserved for application use”. JFIF uses the segment to hold an identifier (the string “JFIF”), and thumbnail data, among a few other things. The images I have tried to decode did not contain a thumbnail, so this is seemingly rare.

Reading the file

Initially we will simply read the whole image file. There might be some memory and/or speed optimization potential using some kind of streaming approach, but for now, let’s stick with a Vec<u8>4.

fn file_to_bytes(path: &Path) -> Result<Vec<u8>, std::io::Error> {
    File::open(path).and_then(|mut file| {
        let mut bytes = Vec::new();
        try!(file.read_to_end(&mut bytes));
        Ok(bytes)
    })
}

The Marker Segment Loop

This will be the main loop of the decoder. We keep track of where we are in the file, and read segment for segment. Since most segments say exactly how long they are, this works out pretty well.

let mut i = 0;
while i < vec.len() {
    if let Some(marker) = bytes_to_marker(&vec[i..]) {
        if marker == Marker::EndOfImage || marker == Marker::StartOfImage {
            // These markers doesn't have length bytes, so they must be
            // handled separately, in order to to avoid out-of-bounds indexes,
            // or reading nonsense lengths.
            i += 2;
            continue;
        }

        let data_length = (u8s_to_u16(&vec[i + 2..]) - 2) as usize;
        i += 4;

        match marker {
            Marker::Comment => { /* Read comment data */ }
            Marker::QuantizationTable => { /* Read table data */ }
            // Handle the rest of the markers
        }
        i += data_length;
    } else {
        panic!("Unhandled byte marker: {:02x} {:02x}", vec[i], vec[i + 1]);
    }
}

There are quite a few different segments, but not all are very interesting. The segments we will look at in this post are Comment, DefineHuffmanTable, QuantizationTable, and StartOfScan.

Reading a segment

As a simple example to get us started reading data in a segment, consider the Comment marker, which is of the following form5:

 2 bytes  2 bytes   length-2 bytes
| marker | length | comment       |

The marker bytes are already read, and so are the length bytes, from which we also subtracted 2, making data_length the actual length of the data part of the segment (comment in this case). Reading the comment into a String is pretty straight forward:

Marker::Comment => {
    let comment = str::from_utf8(&vec[i..i + data_length])
        .map(|s| s.to_string())
        .ok();
    image.comment = comment;
}

QuantizationTable

This segment contains one or more quantization tables (or matrices). Each table is 65 bytes, where the first byte contains two fields: Element Precision and Table Destination, each 4 bits large. Element precision specifies how large each value in the matrix is; 0 for 8-bits, 1 for 16-bits. For baseline sequential DCT (which is the only mode we will support), this has to be 0. Table destination specifies one of four possible “slots” for the matrix to be saved in; that is, it is possible to have four quantization matrices and use different ones in different scans.

The remaining 64 bytes are the values in the matrix. If there are multiple tables in the segment they are located right after one another.

Marker::QuantizationTable => {
    // JPEG B.2.4.1
    let mut index = i;
    while index < i + data_length {
        let precision = (vec[index] & 0xf0) >> 4;
        assert!(precision == 0);
        let identifier = vec[index] & 0x0f;
        let table: Vec<u8> = vec[index + 1..index + 65]
            .iter()
            .cloned()
            .collect();

        image.quantization_tables[identifier as usize] = Some(table);
        // 64 entries + one header byte
        index += 65;
    }
}

DefineHuffmanTable

Reading the Huffman tables are some work. Each table consists of a Table Class and a table destination (sharing one byte), 16 bytes specifying the number of codes of length 1 through 16, and a one byte value for each code. The table class specifies wether the table is used for DC or AC coefficients, but we will get back to this in Part 3.

Marker::DefineHuffmanTable => {
    // JPEG B.2.4.2

    // Head of data for each table
    let mut huffman_index = i;
    // End of segment
    let segment_end = i + data_length;

    while huffman_index < segment_end {
        let table_class = (vec[huffman_index] & 0xf0) >> 4;
        let table_dest_id = vec[huffman_index] & 0x0f;
        huffman_index += 1;

        // There are `size_area[i]` number of codes of length `i + 1`.
        let size_area: &[u8] = &vec[huffman_index..huffman_index + 16];
        huffman_index += 16;

        // TODO: replace with `.sum` as of Rust 1.11
        let number_of_codes = size_area.iter()
            .fold(0, |a, b| a + (*b as usize));

        // Code `i` has value `data_area[i]`
        let data_area: &[u8] = &vec[huffman_index..huffman_index +
                                                   number_of_codes];
        huffman_index += number_of_codes;

        let huffman_table =
            huffman::HuffmanTable::from_size_data_tables(size_area, data_area);
        // DC = 0, AC = 1
        if table_class == 0 {
            image.huffman_dc_tables[table_dest_id as usize] =
                Some(huffman_table);
        } else {
            image.huffman_ac_tables[table_dest_id as usize] =
                Some(huffman_table);
        }
    }
}

Processing the table data we read from the file is a little work: this happens in huffman::HuffmanTable::from_size_data_tables, to which we pass a “size table” (how many codes are of length i?), and a “data table” (which value is code i mapped to?)6.

The Huffman module defines two structs:

#[derive(Debug, Clone)]
pub struct HuffmanCode {
    /// How many bits are used in the code
    length: u8,
    /// The bit code. If the number of bits used to represent the code is less
    /// than `length`, prepend `len-length` `0`s in front.
    code: u16,
    /// The value the code is mapped to.
    value: u8,
}

#[derive(Debug)]
pub struct HuffmanTable {
    /// A list of all codes in the table, sorted on code length
    codes: Vec<HuffmanCode>,
}

For simplicity, the actual table is just a Vec of HuffmanCodes7; we may see in a later post how to improve the performance here.

Creating the table is done in two steps. First we create a Vec with the length of each code, such that code i has length code_lengths[i]; remember that the size_data read from the file is the number of codes of each length, and now we want the length of each code. Then we create a Vec with the codes, by merging the three parts: code length, code bit string, and code value.

impl HuffmanTable {
    pub fn from_size_data_tables(size_data: &[u8], data_table: &[u8]) -> HuffmanTable {
        let code_lengths: Vec<u8> = (0..16)
            .flat_map(|i| repeat(i as u8 + 1).take(size_data[i] as usize))
            .collect();

        let code_table: Vec<u16> = HuffmanTable::make_code_table(&code_lengths);

        let codes: Vec<HuffmanCode> = data_table.iter()
            .zip(code_lengths.iter())
            .zip(code_table.iter())
            .map(|((&value, &length), &code)| {
                HuffmanCode {
                    length: length,
                    code: code,
                    value: value,
                }
            })
            .collect();

        HuffmanTable { codes: codes }
    }

    fn make_code_table(sizes: &[u8]) -> Vec<u16> {
        // This is more or less just an implementation of a
        // flowchart (Figure C.2) in the standard.
        let mut vec = Vec::new();
        let mut code: u16 = 0;
        let mut current_size = sizes[0];
        for &size in sizes {
            while size > current_size {
                code <<= 1;
                current_size += 1;
            }
            vec.push(code);
            if current_size > 16 || code == 0xffff {
                break;
            }
            code += 1;
        }
        vec
    }
}

Beware: make_code_table was previously purely an implmementation of the flow chart mentioned in the comment, but this was so ugly that I decided to implement it from scratch. It has not been thoroughly tested, but it seems to work as intended.

Now the table is read, processed, and put into its place. Next up is actually decoding image data.

StartOfScan

First we read in fields for the current scan. This includes eg. the number of components, and which tables they use. There is nothing fancy just yet; the data format is, as usual, listed in the JPEG specification.

Marker::StartOfScan => {
    // JPEG B.2.3
    let num_components = vec[i];
    let mut scan_components = Vec::new();
    for component in 0..num_components {
        scan_components.push(ScanComponentHeader {
            component_id: vec[i + 1],
            dc_table_selector: (vec[i + 2] & 0xf0) >> 4,
            ac_table_selector: vec[i + 2] & 0x0f,
        });
        i += 2;
    }

    let scan_header = ScanHeader {
        num_components: num_components,
        scan_components: scan_components,
        start_spectral_selection: vec[i + 1],
        end_spectral_selection: vec[i + 2],
        successive_approximation_bit_pos_high: (vec[i + 3] & 0xf0) >> 4,
        successive_approximation_bit_pos_low: vec[i + 3] & 0x0f,
    };
    // Register read data
    i += 4;

    if image.scan_headers.is_none() {
        image.scan_headers = Some(Vec::new());
    }
    image.scan_headers
        .as_mut()
        .map(|v| v.push(scan_header.clone()));

As we have seen, markers are of the format 0xff__. So what if the image data contains 0xff? The solution is to encode 0xff as 0xff00, which is not a marker code. Therefore we need to replace ff00 with ff before sending it to the huffman decoder, where we decode actual image data. Additionally we need to keep track of how many bytes we skip, in order to increment i correctly when we are done with this scan.

    // Copy data, and replace 0xff00 with 0xff.
    let mut bytes_skipped = 0;
    let mut encoded_data = Vec::new();
    {
        let mut i = i;
        while i < vec.len() {
            encoded_data.push(vec[i]);
            if vec[i] == 0xff && vec[i + 1] == 0x00 {
                // Skip the 0x00 part here.
                i += 1;
                bytes_skipped += 1;
            }
            i += 1;
        }
    }

Note that we are copying the entire image data here. Is this necessary? Not really. Is it slow? Uuh, maybe? It is simple? Hell yes.

After this we pass in the huffman tables and quantization matrices which we hopefully have read in an earlier segment. The code for this is nothing special, and somewhat ugly, so it has been omitted. At last, we decode the image, and advance the index appropriately.

    let (image_data, bytes_read) = jpeg_decoder.decode();
    image.image_data = Some(image_data);

    // Since we are calculating how much data there is in this segment,
    // we update `i` manually, and `continue` the `while` loop.
    i += bytes_read + bytes_skipped;
    continue;
}

Note that we assume the image contains only one scan. As I mentioned in Part 1, all images I have tested containst just that, so this is an assumption we will continue to make. If we are afraid of forgetting this, we could always add an assert!(image.image_data.is_none()) at the top of the StartOfScan block.

So what happens in JpegDecoder::decode? That will have to wait for the next part.


  1. I know images with certain sampling factors are messed up. If possible, check identify -verbose <image.jpeg> | grep sampling (requires ImageMagick). Only 1x1 and 2x1 are supported. ↩︎

  2. When developing, I did not use this image, because of its size (decoding this actually takes almost two seconds on my laptop!), and its complexity (multiple channels, present scaling factors, etc.). Rather, I tested the decoder with a 16x8 grayscale version of the same image, and advanced to a full size grayscale image of Lena, when the small image showed correctly. ↩︎

  3. Note that while this is the JFIF spec, the markers used are defined in the JPEG spec. ↩︎

  4. /u/DroidLogician pointed out a huge performance flaw in the original file reading code. Check the reddit thread out! ↩︎

  5. If you think my ASCII skills are nonexistent, check page 47 (marked 43) in the JPEG spec. ↩︎

  6. The naming here is the same as what is used in the spec. I think there are room for improvements, but I have yet to come up with better names. ↩︎

  7. By now, you might see how I plan to use the Huffman table to decode data. In retrospect, constructing an actual tree and traversing it with a bit stream iterator of sorts would perhaps have been a better idea, although it would have been more work up front. ↩︎

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License