JPG压缩的第4步是哈夫曼编码。下面主要介绍JPEG是如果进行哈夫曼编码的。

图片引用自"Compressed Image File Formats JPEG, PNG, GIF, XBM, BMP - John Miano"[1]
1.AC数据的哈夫曼Symbol.
对于AC数据而言,需要编码的前4位代表这个数据之前有多少个0,后4位代表当前值的Magnitude Value。

AC数据的编码是以ZigZag的顺序进行的。
如下图为例,从1开始前面有0个0, 数值大小为1, magnitude value为1,需要编码的symbol为0x01;
接着走到3处,前面有4个0,数值大小为3, magnitude value 为2,需要编码的symbol为0x42;
以此类推:
唯一有的2个额外的情况
0x00代表后面的数据都为0
0xF0代表16个0

总共的symbol数量 = (为0的个数)16 * 10(不同的maginitude) + 2 (特殊情况) = 162。
2.DC数据的哈夫曼Symbol
DC数据存的是difference,即当前Block的DC值减去上一个Block的DC值。
如下可知DC symbol总共有12个

3.JPEG默认哈夫曼编码
JPEG提供了默认的huffmanTable(emprically good)[2],如下



引用"https://www.impulseadventure.com/photo/optimized-jpeg.html"
也可以自己根据图片生成huffmanCode,代码如下
- void generateHuffmanCode() {
- std::vector<LinkedSymbol> symbols;
- //遍历每个出现的symbol, add to vectors
- for(uint symbol = 0; symbol < 256; symbol++) {
- if(countOfSymbol[symbol] == 0)
- continue;
- Symbol* s = new Symbol(symbol, countOfSymbol[symbol], 0, nullptr);
- LinkedSymbol linkedSymbol;
- linkedSymbol.symbol = s;
- linkedSymbol.weight = s->weight;
- symbols.push_back(linkedSymbol);
- }
-
-
- // FF是一个不会出现的symbol,作为我们的dummy symbol 防止one bit stream 的出现 比如11111, 这样就可以防止compressdata中出现FF的可能
- Symbol* dummySymbol = new Symbol(0xFF, 1, 0, nullptr);
- LinkedSymbol dymmyLinkedSymbol;
- dymmyLinkedSymbol.symbol = dummySymbol;
- dymmyLinkedSymbol.weight = dummySymbol->weight;
- symbols.push_back(dymmyLinkedSymbol);
-
-
- //合并的过程
- while(symbols.size() != 1) {
-
- //leastWeight
- LinkedSymbol least = getLeastWeightLinkedSymbol(symbols);
- //second Least Weight
- LinkedSymbol second = getLeastWeightLinkedSymbol(symbols);
- //add two weights
- least.weight = least.weight + second.weight;
- //linked two linkedsymbols;
- Symbol* temp = second.symbol;
- while(temp->nextSymbol != nullptr)
- temp = temp->nextSymbol;
- temp->nextSymbol = least.symbol;
- least.symbol = second.symbol;
- //把每个symbol加1 codeLength,并且加入到
- for(auto i = least.symbol; i != nullptr; i = i->nextSymbol) {
- i->codeLength++;
- }
- symbols.push_back(least);
- }
- //放入sortedSymbols
- for(Symbol* i = symbols[0].symbol; i != nullptr; i = i->nextSymbol) {
- sortedSymbol.push_back(*i);
- }
- //排序,并且把dummy symbol 放在最后面;
- std::sort(sortedSymbol.begin(), sortedSymbol.end(), comp);
- //释放内存
- Symbol* temp = symbols[0].symbol;
- while(temp != nullptr) {
- auto t = temp->nextSymbol;
- delete temp;
- temp = t;
- }
- //长度为n的code的个数
- //生成codeLengthCount for each codeLength;
- for (auto it = sortedSymbol.cbegin(); it != sortedSymbol.cend(); it++) {
- codeCountOfLength[it->codeLength]++;
- }
- //规定codeLength不能大于16, 套用书上的方法实现了一下
- for(uint ii = 32; ii >= 17; ii--) {
- while(codeCountOfLength[ii] != 0) {
- uint jj = ii - 2;
- while(codeCountOfLength[jj] == 0)
- jj--;
- codeCountOfLength[ii] = codeCountOfLength[ii] - 2;
- codeCountOfLength[ii - 1] = codeCountOfLength[ii - 1] + 1;
- codeCountOfLength[jj + 1] = codeCountOfLength[jj + 1] + 2;
- codeCountOfLength[jj] = codeCountOfLength[jj] - 1;
- }
- }
- uint index = 1; //codeLength赋值回去
- for (auto it = sortedSymbol.begin(); it != sortedSymbol.end(); it++) {
- if(codeCountOfLength[index] != 0) {
- it->codeLength = index;
- codeCountOfLength[index]--;
- } else {
- index++;
- it--;
- }
- }
-
- //生成huffmanCode for each symbol
- uint huffmanCode = 0;
- uint currentLength = 1;
- for (auto it = sortedSymbol.begin(); it != sortedSymbol.end(); it++) {
- if(currentLength == it->codeLength) {
- it->code = huffmanCode++;
- codeOfSymbol[it->symbol] = it->code;
- codeLengthOfSymbol[it->symbol] = it->codeLength;
- } else {
- huffmanCode = huffmanCode << 1;
- currentLength++;
- it--;
- }
- }
- }
全部代码在https://github.com/Cheemion/JPEG_COMPRESS/tree/main/Day5
完结
Thanks for reading.
>>>> JPG学习笔记6
参考资料
[1]https://github.com/Cheemion/JPEG_COMPRESS/blob/main/resource/Compressed%20Image%20File%20Formats%20JPEG%2C%20PNG%2C%20GIF%2C%20XBM%2C%20BMP%20-%20John%20Miano.pdf
[2]https://www.impulseadventure.com/photo/optimized-jpeg.html