Ubuntu – Chinese encoding in names of compressed files in zip


Sorry for asking a question similar to my previous one. The difference from the last question is that now it is in a zip archive where Chinese encoding in names of compressed files are not recognized, both after extraction and after listing the content of the zip archive:

$ unzip -l "严蔚敏数据结构(c语言版)教材及答案.zip"
Archive:  严蔚敏数据结构(c语言版)教材及答案.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    25600  2000-01-04 23:27   ?+?+i- ??-?.doc
    80896  2000-01-04 23:27   ?+??i- -+.doc
    41984  2000-01-04 23:27   ?++?i- i+????-?.doc
    52224  2000-01-04 23:27   ?+?+i- ??i?.doc
    50688  2000-01-04 23:27   ?+??i- ??????.doc
    54272  2000-01-04 23:27   ?++?i- -????-??????.doc
    26112  2000-01-04 23:27   ?+?-i- ?????????_+?.doc
    76288  2000-01-04 23:27   ?+-?i- -??-????-?.doc
    53760  2000-01-04 23:27   ?+-?i- -+?+++?=.doc
    53760  2000-01-04 23:27   ?+--i- ??.doc
  7929077  2009-02-26 22:49   -???????+C????+??+?+?+pdf.pdf
---------                     -------
  8444661                     11 files

I was wondering how to deal with this problem?

Thanks and regards!


I have uploaded this zip archive to and it can be downloaded from http://www.mediafire.com/?dw87ee72m56evy9

I tried to use chardet to determine the encoding of the names of the compressed files by:

$ unzip -l "严蔚敏数据结构(c语言版)教材及答案.zip" | chardet
<stdin>: utf-8 (confidence: 0.99)

But are the file names indeed encoded in utf-8? Aren't they supposed to be in a foreign encoding? I guess the output by unzip -l are too much, and how shall I only single out the filenames in its output as input to chardet?

Best Answer

  • Try:

    unzip -O cp936 "严蔚敏数据结构(c语言版)教材及答案.zip"