# How does ‘dictionary size’ affect compression

compression

I know that higher size may lead to better compression ratio and vice verca. But is there a way I can decide better?.. since there are so many choices

So far I've noticed dictionary size ≈ file size yields optimum compression.

Here the ∼8mb file test.avi has same compression ratio for all dictionary sizes greater than 8mb. Then it starts to fall.

Repeatable items are stored in a dictionary and a code is assigned as a substitute.

THIS IS AN OVER SIMPLIFICATION

aaaaaaaaaaaaaaaaaaaaaaaa  0001
bbbbbbbbbbbbbbbbbbbbbbbb  0002
alsdjl;asjdfkl;asdfjkljj  0003


instead of the whole line it just put the code in its place. The larger the dictionary the more codes it can handle. Normally, when a dictionary becomes full it starts a new one on the fly. When it starts a new one it is blank and new codes are assigned to detected patterns.

Generally, the larger the better to a point. The entire dictionary is held in memory so you need more RAM than the dictionary size.

The dictionary size depends on the compressibility of your data, the number of files, size, and overall size.

Generally, 32mb is more than enough, but if your compressing numerous multi-gig files then a much higher number can be used. Larger dictionaries often make the process slower, but the results in a smaller file.