[小项目]手把手教你C语言哈夫曼压缩解压缩_C

前言

这是大一写过的一个小项目，现在大三，重新实现了一下。这是原来的链接，可以看一下效果，思路和现在的一样。
扩展性、健壮性比原来更好，思路也更清晰了。当时只想花里胡哨，这次重心放在质量、功能上。
后续会不断改进它，直到它贴近实际。

项目分析

项目围绕实现压缩、解压缩。
我们遵循模块间低耦合、模块内高内聚的原则来分析；代码尽量保证可移植性。

模块划分

压缩/解压缩均通过哈夫曼编码算法来实现，所以我们的第一个模块为算法模块。
实际的任务流程需要一个模块来控制，负责流程的控制，提供简单可用的接口，划分为接口模块。
在进行编码时，我们需要进行字节和位串的转换，这就需要位的 *** 作，C语言没有提供这样的函数，需要自己实现，所以划分为位 *** 作模块。
对于任一个模块，我们希望给它一个输入，产生一个结果，而不用它考虑输入来自哪里、输出到了哪里。所以我们需要一个流处理模块来集中解决这个问题，这样也可以降低各模块的耦合程度。
为了方便，我们需要一个打印错误信息的模块，就把它作为错误处理模块。
为了出现错误时更方便排查，我们增加一个测试模块，实际上是将各个模块节点的IO进行记录，方便对比排查。

将项目划分为以下几个模块：

算法模块
接口模块
流处理模块
错误处理模块
位 *** 作模块
测试模块

实测中需要频繁地使用缓冲区，包括字符串缓冲区、字节缓冲区、位缓冲区，每次都要实现一遍非常不方便，好在数量不多，后续会增加缓冲区模块。

模块功能分析 1.算法模块

算法模块的任务是通过哈夫曼编码得到编码表，输入和输出进行功能分析。这个模块聚合程度很高，我们尽可能不去动它。

统计字节频率
- 编码需要构造哈夫曼树，而构造树需要有weight（权值），而权值需要扫描输入流来进行统计，显然这一工作与编码独立，所以我们将它作为单独功能。
构造哈夫曼树
- 不用说，为编码做准备。
哈夫曼编码
- 得到哈夫曼编码表。
- 这里的表是抽象类型，实际上用二级指针数组存储编码字符串指针。
- 编码的结果写入输出流，流均由流处理模块指定。

2.接口模块

这个模块负责将各个模块连接，提供压缩、解压缩的接口。

3.流处理模块

这个模块负责压缩文件写入的格式、解析压缩文件、错误打印格式等。

4.错误处理模块

较为简单，但注意打印的消息要写入标准错误流（STDERR）而非标准输出流（STDOUT）。

5. 位 *** 作模块

提供位 *** 作。

6.测试模块

这个模块虽然名义上为模块，但实际上却渗透到各个模块中，我们通过宏定义开关来降低它与其他模块的耦合度。

流程图

这里主要描述压缩、解压、压缩文件格式。

压缩逻辑流程图解压逻辑流程图名词/行为解释

压缩
- 实际上是将原字节替换为哈夫曼编码。
解压
- 压缩的逆 *** 作，即将哈夫曼编码替换为原字节码。
余码
- 由于编码总长度不是8的整数倍而多余出来的码。

压缩文件格式

压缩文件包括：

文件头
- 余码
- 原文件大小
编码表
压缩数据

例子

假设有个压缩文件1.hfm，那么在文件中的数据是这样的：

10000   
1824                
00	11111111011010110000
01	11111111011010110001
02	11111111011010110010
03	11111111011010110011
04	11111111011010110100
05	11111111011010110101
......
FE	1111111101101010110
FF	1111111101101010111
�fn�ղ�R_p�[/�n��`_�����]W�CI{z��<���P����kK��O������!&��,t䛮���p������
��a`nP
......

第一行是余码；第二行是原文件字节数；接下来的256行（[0, 255]）是编码表；最后的N行是压缩后的数据，没错，它是乱七八糟的。

代码解读

实际实现时，我们按照由易到难、由具体到抽象的顺序来实现。
很明显，错误处理模块、位 *** 作模块、算法就很具体，而流格式化的模块就相对抽象。

错误处理模块

用到了变参函数。
err.c

#include "err.h"

#include 
#include 
#include 

void errPrint(const char *format, va_list arg_list)
{
    char buf[ERR_BUF_SIZE];
    vsnprintf(buf, sizeof(buf), format, arg_list);
    fprintf(stderr, "%s", buf);
}

void errExit(const char *format, va_list arg_list)
{
    errPrint(format, arg_list);
    exit(EXIT_FAILURE);
}

void errCaller(void (*errFunc)(const char *, va_list), const char *format, ...)
{
    va_list arg_list;
    va_start(arg_list, format);
    errFunc(format, arg_list);
    va_end(arg_list);
}

err.h

#ifndef ERR_H
#define ERR_H

#define ERR_BUF_SIZE 4096
#include 

void errPrint(const char *format, va_list arg_list);
void errExit(const char *format, va_list arg_list);
void errCaller(void (*errFunc)(const char *, va_list), const char *format, ...);
#endif

位 *** 作模块

bit.c

#include "bit.h"

#include 
#include 
#include 
#include 

#include "err.h"
void setbit(Byte *pbyte, size_t ordinal, bool value)
{
    // ordinal is [0,7], oridinal is from RIGHT to LEFT.
    // bit&1 或者 bit|0，不改变值。bit&0 = 0， bit|1 = 1
    Byte mask_AND_to_0[8] = {
        0xFE,  // 0b11111110,
        0xFD,  // 0b11111101,
        0xFB,  // 0b11111011,
        0xF7,  // 0b11110111,

        0xEF,  // 0b11101111,
        0xDF,  // 0b11011111,
        0xBF,  // 0b10111111,
        0x7F   // 0b01111111
    };
    Byte mask_OR_to_1[8] = {
        0x01,  // 0b00000001,
        0x02,  // 0b00000010,
        0x04,  // 0b00000100,
        0x08,  // 0b00001000,
        0x10,  // 0b00010000,
        0x20,  // 0b00100000,
        0x40,  // 0b01000000,
        0x80   // 0b10000000
    };
    if (value == 0)  // set to 0
        (*pbyte) &= mask_AND_to_0[ordinal];
    else if (value == 1)
        (*pbyte) |= mask_OR_to_1[ordinal];
}
int getbit(Byte byte, size_t oridinal)
{
    return (byte >> (oridinal - 1)) | 0x00;
    //或者 return (byte >> (oridinal -1)) & 0x1;
}

static size_t rest_bit_quantity(Byte *bytebit_buf, size_t buf_size, size_t byte_end, size_t bit_end)
{  //辅助函数，return the rest quantity of bits of bytebit_buf.
    return (buf_size - byte_end) * 8 - bit_end;
}

// byte_end is the index of the last byte not full to 8 bits
// bit_end is the index of the end of bytebit_buf[byte_end], not saving data, just like '// bitcat() will trans _01_str to bit and join it to bytebit_buf.' for  string.
bitcat
bool (*Byte ,bytebit_bufsize_t , buf_sizesize_t * ,byte_endsize_t * ,bit_endconst char * )_01_strsize_t
{
    ; iif
    ( strlen()_01_strrest_bit_quantity > (,bytebit_buf, buf_size* ,byte_end* )bit_end)return
        ; false//缓冲区已满，无法存入整个01字符串，不存入缓冲区，返回false  for
    ( =i 0 ;[ _01_str]i!= ';' ++) setbiti( {
        &([*bytebit_buf])byte_end,7- * , [bit_end] _01_str-i'0' ) ;++(
        *);bit_end// bit_end加1if        (
        * 7)bit_end > // bit满8,byte_end+1，bit_end归零( {  *
            )%=bit_end8 ; ++(
            *);byte_end}}
        return
    ;
    } true#
ifndef

bit.h

BIT_H# define
BIT_H# include

#include 
typedefunsigned 

char ; # Bytedefine
BYTE_NUM256 //[0,255],2^8个 #  define
BYTE_MIN0 # define
BYTE_MAX255 void setbit

( *,Byte size_tpbyte, ) ordinal; bool valueintgetbit
( ,size_tByte byte) ; oridinalbitcat(
bool *,Byte size_tbytebit_buf, size_t buf_size* , size_tbyte_end* , constbit_endchar * ) ;_01_str#endif
mFclose()NULL

包裹函数

在实现剩下的模块之前，我们将一些函数封装成包裹函数，这样不用频繁验证返回值，减少分散我们的注意力。

除了包裹函数还有以下两个自定义函数：

itoa_()，即many Fclose，可以一次性关闭多个文件（注意最后一个参数必须是itoa()，否则会运行时错误而中止程序）。
min_length，由于linux下没有itoa()函数，便自己实现了一下，并且扩展了它的功能，可选补齐前导0 （通过FILE*参数）。在字节转位串的时候非常方便。加一条_是为了在其他系统编译时不与标准库的FILE冲突。

由于File无法得知打开的文件名，在报错误处理时很不方便，为了保留文件名我们将#封装为include，并为它适配了相关的包裹函数。

pkg.c

"pkg.h"# include

#include 
#include 
#include 
#include 

"err.h"* Fopen

File (constchar* , constfilenamechar * ) *mode=
{
    File (file * )File malloc(sizeof())File;=fopen
    file->pfile ( ,)filename; modeif(
    == NULLfile->pfile ) errCaller( ,ERR_MSG_FOPENerrExit( ))file;snprintf(
    ,sizeoffile->filename( ),file->filename"%s", ); filenamesnprintf(
    ,sizeoffile->mode( ),file->mode"%s", ); modereturn;
    } fileint
Feof
( *)File returnstreamfeof { ( );stream->pfile}void Rewind
( *)File rewindstream( { );stream->pfile}void Fclose
( *)File ifstream(
{
    == NULLstream ) return; if(
    fclose ()!=stream->pfile0 ) errCaller( ,ERR_MSG_FCLOSEerrExit( ))stream;free(
    );stream}void
mFclose

( *,File .stream. .)// File* stream_1, File* stream_2 ...,NULL;  va_start
{
    va_list arg_list(
    ,)arg_list; streamFclose(
    );streamfor(
    ; ;)=va_arg {
        stream ( ,*arg_list) File ;if(
        == NULLstream ) break; Fclose(
        );stream}va_end
    (
    );arg_list}ssize_t
Fread
( void*, size_tmem, size_t elem_size, * elem_count) File returnistreamfread
{
    ( ,,mem, elem_size) elem_count; istream->pfile}ssize_t
Fwrite
( void*, size_tmem, size_t elem_size, * elem_count) File ssize_tostream=
{
    fwrite count ( ,,mem, elem_size) elem_count; ostream->pfileif(
    < )count errCaller elem_count( ,ERR_MSG_FWRITEerrExit( ))ostream;return;
    } countint
Fscanf
( *,File constistreamchar * , .format. .);va_start
{
    va_list arg_list(
    ,)arg_list; formatint=
    vfscanf ret ( ,,istream->pfile) format; arg_listva_end(
    );arg_listreturn;
    } retint
Fprintf
( *,File constostreamchar * , .format. .);va_start
{
    va_list arg_list(
    ,)arg_list; formatint=
    vfprintf ret ( ,,ostream->pfile) format; arg_listva_end(
    );arg_listreturn;
    } retchar
*
itoa_ (size_t,char value* , size_tresult, size_t radix) const min_lengthchar
{
    [ ] digits="0123456789abcdef" ; char[
    66 buf];// 65 is the binary length of 2^64;const  size_t
    = sizeof end ( )-buf1 ; constsize_t
    = - prefix_0_beg ; end size_t min_length=
    ; beg [ end]
    buf=end';' size_t ,;
    for digit( i;
    0 ;) value > =%; {
        digit [ value -- radix]
        buf=[beg] ; digits/=digit;}
        value for radix(
    =
    ; <i ; prefix_0_beg++ i ) beg//不到最小长度的补前缀0至最小长度,这对不满8位的转换相当重要。 [i]  =
        buf'0'i; if ()
    = ;beg > prefix_0_begstrcpy beg ( prefix_0_beg,
    &[result] )buf;begstrcat(,
    ")";resultreturn ;}#
    ifndef resultPKG_H
#

pkg.h

definePKG_H #
include# define

countof( 

)( sizeof(array) /sizeof(array[ 0 ])array)#defineFILENAME_SIZE512

#define MODE_SIZE 4
typedefstruct File *
; char [ {
    FILE ]pfile;
    char filename[FILENAME_SIZE];
    } mode;MODE_SIZE// the type of pfile is File*.#
define FileERR_MSG_FOPEN

(
)"Failed to open file %s." ,#pfiledefine ERR_MSG_FCLOSE( pfile->filename
)"Failed to close file %s." ,#pfiledefine ERR_MSG_FWRITE( pfile->filename
)"Failed to write the whole memory to file %s." ,intpfileFeof (* pfile->filename

) ;voidFile Rewindstream(*
) ;*File Fopenstream(const
File char*,const char *file) ; void Fclosemode(*
) ;intFile Fscanfstream(*
, constcharFile *istream, . . .format) ;intFprintf(*
, constcharFile *ostream, . . .format) ;voidmFclose(*
, ..File .stream) ;// File* stream_1, File* stream_2 ...,NULLssize_tFread(  void
* ,size_t, size_tmem, * elem_size) ; elem_countssize_t File Fwriteistream(void
* ,size_t, size_tmem, * elem_size) ; elem_countchar File *ostreamitoa_(
size_t ,char*, valuesize_t , size_tresult) ; radix# endif min_length变量名是越长越好，还是越短越好？太长读代码就像阅读题，太短则像文言文，我宁愿它长一点。
用驼峰还是下划线？考虑到变量名长，下划线法单词辨识度应该更高。

#include

算法模块

到这里有两问题一直很纠结，

"huffman.h"

huffman.c

include# include

#include 
#include 
#include 
#include 
#include 
"bit.h"# 

include"err.h" #
include"huffman.h" static
voidselect_min (
, const size_t,huffman_tree treesize_t * , curr_node_indexsize_t * )min_1size_t , =min_2;
{
    for i( min_weight = SIZE_MAX0
    ; <i ; ++) i if curr_node_index( [i]
        . ==tree&&i[]parent . HAVE_NO_PARENT < tree)i=[weight ] min_weight. {
            min_weight ; tree*i=;weight}
            formin_1 ( i=
        0
    , =i ; <; min_weight ++ SIZE_MAX) i if curr_node_index( [i]
        . ==tree&&i[]parent . HAVE_NO_PARENT < tree&&i!=*weight ) min_weight = i [ ]min_1. {
            min_weight ; tree*i=;weight}
            }min_2 //这里pTree是树的指针，因为要在外部保存树 ivoid
        create_huffman_tree
(

*
, constsize_thuffman_tree *pTree, size_t ) size_tweight, , weight_elem_num;
{
    const isize_t min_1= min_2;
    //叶子结点数为权值数 const num_leafNode size_t weight_elem_num=  2
    * - num_allNode 1 ; * num_leafNode = ()

    mallocpTree ( *huffman_treesizeof()num_allNode ) ;// huffman tree 结点数组huffman_tree_node=*;  for
    huffman_tree tree ( =pTree0

    ; <i ; ++) i [ num_allNode] .i= tree;i//初始时没有父结点，设为-1forparent ( HAVE_NO_PARENT=  0
    ; <i ; ++) i [ num_leafNode] .i= tree[i];weight // leaf node 权值载入 weight//[0, n-1]是n个叶子结点，[n,2n-1)（[n,2n-2]、[n, m)）是n-1个双分支结点.ifor(      =

    ;
    < ;i ++ num_leafNode) i // i是当前结点，min_1、min_2是权值最小的两个结点 num_allNodeselect_min (i, {  ,
        &,tree& i) ;min_1[ ]min_2.=
        tree;min_1[]parent . i=
        tree;min_2[]parent . i=
        tree;i[]lchild . min_1=
        tree;i[]rchild . min_2=
        tree[i].weight + tree[min_1].weight ; tree}min_2}charweight*
    *
huffman_encode
( const,size_t)size_t huffman_tree tree, , num_leafNode,
{
    ; parentconst currsize_t i= start;
    char * n * num_leafNode=
    ( char*encode_result * )malloc (*sizeof(charn * )); char*=(
    char *one_code ) alloca( *sizeof(charn ) );//分配编码工作空间[-1  ]
    one_code=n ';' for( = 0;
    < ;i ++ )//遍历tree中所有结点 i = n; =i- {  1
        curr ; i=
        start [ n ] .;
        parent while tree(curr!=)parent//遍历该节点的父结点，一直找到根结点，即没有父结点的结点
        if (parent [ HAVE_NO_PARENT] {  .
            == )tree//当前结点位于左分支parent[--lchild ] curr=  '0'
                one_code;elsestart[ -- ]=
            '1'
                one_code;=start; //继续上行找父节点 =[
            curr ] parent.  ;
            parent } tree[parent]=parent(
        char
        encode_result*i) malloc (( -)*sizeof(n char start) ) ;//动态分配编码长度strcpy([]  ,
        &[encode_result]i); }one_codereturnstart;// static variable encode_result shold be free outside.}
    void
    huffman_decode encode_result(  char
*

, size_t*, const_01_strchar * *_01_str_end, * , size_t*encode_result) Byte // result_end is the first byte not saving data, like 'size_t'.result,
                    ; *result_end=
{  0
    ; beg; ifor
    (result_end = 0;
    bool find[
    ] !=beg ';' )for _01_str(beg= 0 ,=; {
        < ;i ++ )size_t find = falsestrlen i ( BYTE_NUM[ ]i) {
            ; len if (strncmpencode_result(i[],
            & [],encode_result)i==0 )_01_str//匹配成功beg[* len] = ;++ {  (
                result*)result_end; += i;
                =;breakresult_end;}
                beg } lenif
                find ( true!
                )break
            ;
        //说明一个都没匹配到，结束
        } char[find9 ];  strcpy
    (
    , tmp_str&[])
    ;//剩下的复位到首部tmp_strstrcpy (_01_str,beg);*  =
    strlen(_01_str) tmp_str;}
    #_01_str_end ifndef HUFFMAN_H#_01_strdefineHUFFMAN_H
#

huffman.h

defineHAVE_NO_PARENT -
1# include

"bit.h"typedef struct huffman_node_data_type;
}; typedef
struct huffman_tree_node ; {
    Byte datasize_t
; huffman_node_data_typeint

, , ; {
    huffman_node_data_type data}
    , weight*
    ; lchildvoid rchildcreate_huffman_tree parent(
* huffman_tree_node, consthuffman_treesize_t

* ,size_thuffman_tree )pTree; char * *weighthuffman_encode ( weight_elem_numconst,
size_t );voidhuffman_decode( huffman_tree treechar * num_leafNode,size_t
* ,constchar *_01_str* , *_01_str_end, size_t * );encode_result# Byte endifresult#
                    include #result_endinclude#
include#

接口模块

interface.c

include"bit.h" 
#include 
"huffman.h"# 

include"pkg.h" #
include"stream_manager.h" void
compress( *
,* ,

const char*File *istream) File [ostream] ; // 此处要保证IO_BUF_SIZE不小于strlen(one_code)，否则永远存不下，永远失败. []encode_result;
{
    Byte read_bufssize_tIO_BUF_SIZE;size_t  ,
    Byte bytebit_buf,BYTE_BIT_BUF_SIZE,;
    reserve_header count(
    ) i; byte_endoutput_huffmanCode bit_end( origin_size,

    );ostream=0
    ;forostream( encode_result==
    origin_size 0 ;!
    Feof (byte_end ) bit_end ; )= Fread(,istreamsizeof([ {
        count 0 ])read_buf, countof(read_buf),);if (<=read_buf0) istreamcontinue;
        += *count sizeof ([ 0]
        origin_size ) count ; //统计原来大小forread_buf(=0;<  ;
        ++ )i const char* i = count[ [i] {
            ] ; ifone_code ( encode_resultbitcatread_buf(i,sizeof(
            ) ,&,bytebit_buf& ,)bytebit_buf)continue ;byte_endelse //连接失败，说明缓冲区已满bit_endfflush_bytebit_buffer one_code(,
                &,
            ) {                                                       ;
                //刷新缓冲区--bytebit_buf; //抵消++i，使得one_code被重写byte_end} ostream}}  fflush_bytebit_buffer
                (i,                                                     &
            ,
        )
    ;
    //所有字符转换完了，再刷新一次.charbytebit_buf[ 9byte_end] ostream;itoa_  (
    [ surplus],,2
    ,0bytebit_buf)byte_end;[ surplus] =';' //截断非有效bitfill_header(
    surplus,bit_end, ) ;}  void
    decompress(ostream* surplus, origin_size*)
char
[ 9]File ;istreamsize_t File ;ostreamconst
{
    char surplus**=(
    const origin_sizechar
    * * )parse_compress_headerencode_result ( ,, & );[],istream[ surplus] ;origin_sizechar[
    Byte read_buf]IO_BUF_SIZE;size_t decode_buf,IO_BUF_SIZE,;
    ssize_t _01_str;_01_STR_BUF_SIZEfor(
    = i= _01_str_end0 decode_buf_end;
    ! countFeof
    ( )decode_buf_end ; _01_str_end ) =Fread (,sizeofistream([0 {
        count ] ),read_bufcountof ()read_buf,);if( <=0read_buf)continue istream;for
        ( =count 0 ;< ;)
        if (i sizeof () i - count1- {
            8 )//足8位itoa__01_str( [ ] , _01_str_end >= &[ {         ]
                ,2read_buf,i8) ;_01_str//连接成01字符串_01_str_end+=8 ;++ ;}else  // 01_str缓冲区已满，刷新缓冲区
                _01_str_end huffman_decode (,
                &i,
            , , {  &
                );_01_strFwrite (_01_str_end, encode_resultsizeof decode_buf( [decode_buf_end0]
                ),decode_buf, );decode_buf}}}strcat( decode_buf_end, ostream);
            huffman_decode
        (
    ,
    &,_01_str, surplus,&
    );_01_strFwrite (_01_str_end, encode_resultsizeof decode_buf( [decode_buf_end0]
    ),decode_buf, );decode_buf}#ifndefINTERFACE_H# decode_buf_enddefine ostreamINTERFACE_H#
include

interface.h

"pkg.h"void compress
(* ,

*, const
char **File )istream; File voidostreamdecompress ( * ,*encode_result);
# endif#File includeistream"stream_manager.h" File #ostreaminclude#
include#

流处理模块

stream_manager.c

include# include

#include 
"bit.h"# 
include"err.h" 
#include 

"pkg.h"void count_byte_weight
(* ,
size_t* ,

size_t )size_tFile ;istreammemset ( ,byte_times0 , byte_times_size)
{
    ; i//字节集归零
    []byte_times; // save the bytes read.ssize_t byte_times_size;for  (
    Byte bytes;IO_BUF_SIZE!Feof                 (
    ) count;
    ) =Fread (,sizeofistream([0 {
        count ] ),bytescountof ()bytes,);if( <=0bytes)continue istream;for
        ( =count 0 ;< ;++
        ) // tarverse all the bytes read, 将对应字节值计数加一i ++ [[ i ] count] ;i} {  }
            }byte_timesvoidbytesoutput_huffmanCodei(*,
        const
    char
*

* )size_tFile ;ostreamconst size_t = ;forencode_result(
{
    = i0
    ; < num_leafNode ; BYTE_NUM++
    ) Fprintfi ( ,, i , num_leafNode[ ]i) {
        ;}ostream} O_FORMAT_BODY_HUFFMAN_CODEchar i* encode_result*iparse_compress_header(*
    ,
char

* ,size_t*)File Fscanfistream( , ,surplus) ; Fscanforigin_size(
{
    ,,istream) I_FORMAT_HEADER_SURPLUS; surpluschar*
    *=istream( I_FORMAT_HEADER_ORIGIN_SIZEchar origin_size**
    ) malloc(encode_result sizeof (char *)*);char[] ;size_t , BYTE_MAX;for
    ( one_code=HUFFMAN_CODE_MAX_LEN0;
    <= i; unused++
    ) Fscanfi ( ,, i & BYTE_MAX, )i; {
        []istream= I_FORMAT_BODY_HUFFMAN_CODE( charunused* one_code)malloc
        encode_result(istrlen ( )* sizeof(char));one_codestrcpy ( [],);}
        return;encode_result// should be free outside.i}ssize_t one_codefflush_bytebit_buffer(
    *
    , encode_resultsize_t  *
,
* )// bytebit_buf[byte_end]是未填充满的字节，此处只写入填充满的字节，即byte_end-1个字节Byte ssize_tbytebit_buf= Fwrite (byte_end, File sizeofostream(
{  [
    0 count ] ),bytebit_buf* ,)bytebit_buf;[0]= [byte_end* ostream];
    bytebit_buf//把结尾未满8位的bit复制到开头*= 0 bytebit_buf;//缓冲区归零byte_endreturn;  }
    voidbyte_end reserve_header (*                            )
    Fprintf count(
,

, "")File ;ostream// surplus
{
    Fprintf(ostream, O_FORMAT_HEADER_SURPLUS, (long)           0
    );ostream// origin_size O_FORMAT_HEADER_ORIGIN_SIZE} staticvoidrewind_header(*)  rewind
(
) ; }voidFile fill_headerostream( { *,ostream->pfileconstchar *
, size_t)File rewind_headerostream( ) ; Fprintfsurplus( , origin_size,
{
    );ostreamFprintf(
    ,,ostream) O_FORMAT_HEADER_SURPLUS; surplus}#
    ifndefSTREAM_MANAGER_Hostream# O_FORMAT_HEADER_ORIGIN_SIZEdefine origin_sizeSTREAM_MANAGER_H#
include

stream_manager.h

"bit.h"# include
"pkg.h"# define

HUFFMAN_CODE_MAX_LENBYTE_NUM #
defineIO_BUF_SIZE 4096

#define BYTE_BIT_BUF_SIZE 4096
#define _01_STR_BUF_SIZE 4096
#define I_FORMAT_HEADER_SURPLUS "%8s\n"
// surplus# define O_FORMAT_HEADER_SURPLUS

"%-8s\n"// surplus # define           I_FORMAT_HEADER_ORIGIN_SIZE
"%20ld\n"// origin_size # define          O_FORMAT_HEADER_ORIGIN_SIZE
"%-20ld\n"// origin_size # define     I_FORMAT_BODY_HUFFMAN_CODE
"%02lX\t%s\n"// byte, huffman_code # define    O_FORMAT_BODY_HUFFMAN_CODE
"%02lX\t%s\n"// byte, huffman_code void count_byte_weight  (
*, size_t *  ,

size_t );File voidistreamoutput_huffmanCode ( *byte_times, const byte_times_sizechar*
* );File charostream* * parse_compress_header (*encode_result,char
* ,size_t*)File ;istreamssize_t fflush_bytebit_buffer (surplus* , size_torigin_size*,
* );Byte voidbytebit_bufreserve_header ( *byte_end) File ;ostreamvoidfill_header
( *,File constostreamchar*
, size_t)File ;ostream# endif # includesurplus# include origin_size"bit.h"#

include"err.h"

main函数

main函数给了一个使用的例子。
main.c

#include 

"huffman.h"# include
"interface.h"# include
"pkg.h"# include
"stream_manager.h"int main
(int ,
char* *
) *,* argc; const char*argv=
{
    File [istream1 ]ostream,
    * = [src_file 2 argv];//统计字节数= Fopendest_file ( argv,"r");

    size_t
    istream [ ];src_filecount_byte_weight (,,
    sizeof byte_times(BYTE_NUM))
    ;//压缩istream; byte_timescreate_huffman_tree (&byte_times,,countof
    (
    huffman_tree tree)
    );consttreechar byte_times* *=byte_times(constchar
    * * )huffman_encodeencode_result_1 ( ,) ; Rewind();=treeFopen BYTE_NUM(,
    "w")istream;compress
    ostream ( ,,dest_file) ;free(
    );istreammFclose ostream( encode_result_1,,
    NULL)encode_result_1;//解压
    constcharistream[ ostream] ="decompress.txt";
    =
    Fopen ( new_file,"r" ) ;=
    istream Fopen (,dest_file"w" );decompress
    ostream ( ,)new_file; mFclose(,
    ,NULListream) ostream;return
    0;istream} ostream= ===
    = =$(
)

makefile

CC $( gcc
BIN ) main
OBJ $( bit.o err.o huffman.o interface.o main.o pkg.o stream_manager.o

LIB )
INC $(
FLAGS ) $(INC) -Wall

main: $(OBJ)
	@$(CC) $(OBJ) -o $(BIN) $(LIB)
bit.o: bit.c bit.h err.h
	@$(CC) -c bit.c -o bit.o $(FLAGS)
err.o: err.c err.h
	@$(CC) -c err.c -o err.o $(FLAGS)
huffman.o: huffman.c huffman.h bit.h err.h
	@$(CC) -c huffman.c -o huffman.o $(FLAGS)
interface.o: interface.c bit.h pkg.h stream_manager.h
	@$(CC) -c interface.c -o interface.o $(FLAGS) 
main.o: main.c bit.h huffman.h interface.h pkg.h stream_manager.h
	@$(CC) -c main.c -o main.o $(FLAGS)
pkg.o: pkg.c pkg.h err.h
	@$(CC) -c pkg.c -o pkg.o $(FLAGS)
stream_manager.o: stream_manager.c stream_manager.h pkg.h bit.h err.h
	@CC -c stream_manager.c -o stream_manager.o FLAGS

.PHONY: clean
clean:
	@rm OBJ BIN

测试模块

这个部分我还没有实现，但是测试了一个618M的mkv视频文件。

可以看到，100%地还原了。但是压缩后的文件更大了

这是因为视频文件本身就是压缩格式的，再压缩它也没有效果。
上次做的压缩率在70%~75%，这次的理论上要高一点，因为为了赶工减少了一些优化步骤，后面会再改进。

后续

上面的代码在我的64位 debian11上编译0报错0警告，gcc版本为10.2.1。
windows上没有再测试了，应该也能通过。

同样也能看到，压缩+解压速度慢得惊人，23min，这是我们不可能接受的。
后续会进行效率提升，应该会用到多线程/多进程。再进一步的，字符串处理可以用内联汇编改写。无论是哪种措施，无疑会降低我们程序的可移植性。
为了提高移植性，后面或许会选择C++，虽然我不怎么用C++。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/2889260.html

[小项目]手把手教你C语言哈夫曼压缩解压缩

发表评论

评论列表（0条）