最近在做解压缩相关项目,需要处理不同格式的文档,各个文件格式,解析器如何知道一个文件是什么格式,主要是文件二进制头(file signatures-文件签名)来决定的。
例如如何确定一个文件是apk(同zip等压缩文件)文件,需要解析其前四个字节“50 4B 03 04”来确定。
反编译apk文件中的dex文件,其前8个字节是固定的(“64 65 78 0A 30 33 35 00”),其他文件类似。
先拷贝各文件签名列表如下,以便查询:
Hex signature | ISO 8859-1 | Offset | File extension | Description |
---|---|---|---|---|
00 | . | 0 | PIC PIF |
IBM Storyboard bitmap file Windows Program
Information File |
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |
........ ........ |
11 | PDB | PalmPilot Database/Document File |
00 00 00 nn 66 74 79 70 33 67 70 |
....ftyp 3gp |
0 | 3GG, 3GP, 3G2 | 3rd Generation Partnership Project 3GPP (nn=0x14)
and 3GPP2 (nn=0x20) multimedia files |
00 00 00 nn 66 74 79 70 33 67 70 35 |
....ftyp 3gp5 |
0 | MP4 | MPEG-4 video files |
00 00 01 00 | .... | 0 | ico | Computer icon encoded in ICO file format[1] |
00 01 00 00 | ... | 0 | ... | Palm Desktop Data File (Access format) |
00 01 42 44 | ... | 0 | DBA | Palm Desktop To Do Archive |
00 01 44 54 | ... | 0 | TDA | Palm Desktop Calendar Archive |
05 07 00 00 42 4F 42 4F 05 07 00 00 00 00 00 00 00 00 00 00 00 01 |
....BOBO............ | 0 | cwk | AppleWorks 5 document |
06 07 E1 00 42 4F 42 4F 06 07 E1 00 00 00 00 00 00 00 00 00 00 01 |
....BOBO............ | 0 | cwk | AppleWorks 6 document |
1F 9D | .. | 0 | z, tar.z | compressed file (often tar zip) using Lempel-Ziv-Welch algorithm |
1F A0 | .. | 0 | z, tar.z | Compressed file (often tar zip) using LZH algorithm |
24 53 44 49 30 30 30 31 | $SDI0001 | 0 | System Deployment Image, a disk image format used by Microsoft | |
25 21 50 53 | %!PS | 0 | ps | PostScript document |
25 50 44 46 | 0 | PDF document | ||
30 26 B2 75 8E 66 CF 11
A6 D9 00 AA 00 62 CE 6C |
0&2u.f?.
|ù.a.b?l |
0 | asf, wma, wmv | Advanced Systems Format[8] |
38 42 50 53 | 8BPS | 0 | psd | Photoshop Document file, Adobe Photoshop‘s native file format |
41 47 44 33 | AGD3 | 0 | fh8 | FreeHand 8 document[18][19][20] |
42 4D | BM | 0 | bmp, dib | BMP file, a bitmap format used mostly in the Windows world |
42 5A 68 | BZh | 0 | bz2 | Compressed file using Bzip2 algorithm |
43 44 30 30 31 | CD001 | 0x8001, 0x8801 or 0x9001 | iso | ISO9660 CD/DVD image file[9] |
43 72 32 34 | Cr24 | 0 | crx | Google Chrome extension[16] or packaged app[17] |
45 52 02 00 00 00 or 8B 45 52 02 00 00 00 |
ER.... or ?ER.... |
0 | toast | Roxio Toast disc image file, also some .dmg-files begin with same bytes |
46 4F 52 4D nn nn nn nn 38 53 56 58 | FORM....8SVX | 0, any | 8svx, 8sv, svx, snd, iff | IFF 8-Bit Sampled Voice |
46 4F 52 4D nn nn nn nn 41 43 42 4D | FORM....ACBM | 0, any | acbm, iff | Amiga Contiguous Bitmap |
46 4F 52 4D nn nn nn nn 41 49 46 46 | FORM....AIFF | 0, any | aiff, aif, aifc, snd, iff | Audio Interchange File Format |
46 4F 52 4D nn nn nn nn 41 4E 42 4D | FORM....ANBM | 0, any | anbm, iff | IFF Animated Bitmap |
46 4F 52 4D nn nn nn nn 41 4E 49 4D | FORM....ANIM | 0, any | anim, iff | IFF CEL Animation |
46 4F 52 4D nn nn nn nn 43 4D 55 53 | FORM....CMUS | 0, any | cmus, mus, iff | IFF Musical Score |
46 4F 52 4D nn nn nn nn 46 41 4E 54 | FORM....FANT | 0, any | iff | Amiga Fantavision Movie |
46 4F 52 4D nn nn nn nn 46 41 58 58 | FORM....FAXX | 0, any | faxx, fax, iff | IFF Facsimile Image |
46 4F 52 4D nn nn nn nn 46 54 58 54 | FORM....FTXT | 0, any | ftxt, txt, iff | IFF Formatted Text |
46 4F 52 4D nn nn nn nn 49 4C 42 4D | FORM....ILBM | 0, any | ilbm, lbm, ibm, iff | IFF Interleaved Bitmap Image |
46 4F 52 4D nn nn nn nn 53 4D 55 53 | FORM....SMUS | 0, any | smus, smu, mus, iff | IFF Simple Musical Score |
46 4F 52 4D nn nn nn nn 59 55 56 4E | FORM....YUVN | 0, any | yuvn, yuv, iff | IFF YUV Image |
47 49 46 38 37 61
47 49 46 38 39 61 |
GIF87a
GIF89a |
0 | gif | Image file encoded in the Graphics Interchange Format (GIF)[2] |
49 44 33 | ID3 | 0 | mp3 | MP3 file with an ID3v2 container |
49 49 2A 00 (little
endian format) or 4D 4D 00 2A (big endian format) |
II*. or MM.* |
0 | tif, tiff | Tagged Image File Format |
4B 44 4D | KDM | 0 | vmdk | VMDK files [14][15] |
4D 54 68 64 | MThd | 0 | mid, midi | MIDI sound file[12] |
4D 5A | MZ | 0 | exe | DOS MZ executable file format and its descendants (including NE and PE) |
4E 45 53 1A | NES | 0 | nes | Nintendo Entertainment System ROM file [25] |
4F 67 67 53 | OggS | 0 | ogg, oga, ogv | Ogg, an open source media container format |
50 4B 03 04, 50 4B 05 06 (empty archive) or 50 4B 07 08 (spanned archive) | PK.. | 0 | zip, jar, odt, ods, odp, docx, xlsx, pptx, apk | zip file format and formats based on it, such as JAR, ODF, OOXML |
50 4D 4F 43 43 4D 4F 43 | PMOCCMOC | 0 | dat | Windows Files And Settings Transfer Repository[22] See also USMT 3.0 (Win XP)[23] and USMT 4.0 (Win 7)[24] User Guides |
52 49 46 46 nn nn nn nn 57 41 56 45 | RIFF....WAVE | 0 | wav | Waveform Audio File Format |
52 61 72 21 1A 07 00 | Rar!... | 0 | rar | RAR archive version 1.50 onwards[3] |
52 61 72 21 1A 07 01 00 | Rar!.... | 0 | rar | RAR archive version 5.0 onwards[4] |
53 44 50 58 (big
endian format) or 58 50 44 53 (little endian format) |
SDPX or XPDS |
0 | dpx | SMPTE DPX image |
53 49 4d 50 4c 45 20 20 3d 20 20 20 20 20 20 20 |
SIMPLE = T | 0 | fits | Flexible Image Transport System (FITS)[10] |
64 65 78 0A 30 33 35 00 | dex 035 |
0 | dex | Dalvik Executable |
66 4C 61 43 | fLaC | 0 | flac | Free Lossless Audio Codec[11] |
75 73 74 61 72 00 30 30 or 75 73 74 61 72 20 20 00 |
ustar.00 or ustar . |
257 | tar | tar archive[26] |
76 2F 31 01 | v/1. | 0 | exr | OpenEXR image |
78 01 73 0D 62 62 60 | x.s.bb` | 0 | dmg | Apple Disk Image file |
78 61 72 21 | xar! | 0 | xar | eXtensible ARchive format[21] |
7F 45 4C 46 | .ELF | 0 | Executable and Linkable Format | |
80 2A 5F D7 | .*_. | 0 | cin | Kodak Cineon image |
89 50 4E 47 0D 0A 1A 0A | .PNG.... | 0 | png | Image encoded in the Portable Network Graphics format[5] |
BE BA FE CA | ... | 0 | DBA | Palm Desktop Calendar Archive |
CA FE BA BE | êto? | 0 | class | Java class file, Mach-O Fat Binary |
CE FA ED FE | ........ | 0 | Mach-O binary (reverse byte ordering scheme, 32-bit)[6] | |
CF FA ED FE | ........ | 0 | Mach-O binary (reverse byte ordering scheme, 64-bit)[7] | |
D0 CF 11 E0 A1 B1 1A E1 | doc, xls, ppt | Microsoft Office documents[13] | ||
EF BB BF | ??? | 0 | UTF-8 encoded Unicode byte order mark, commonly seen in text files. | |
FE ED FA CE | ........ | 0 or typically 0x1000 | Mach-O binary (32-bit) | |
FE ED FA CF | ........ | 0 or typically 0x1000 | Mach-O binary (64-bit) | |
FF D8 FF | ???à | 0 | jpg, jpeg | JPEG |
FF FB | ˙? | 0 | mp3 | MPEG-1 Layer 3 file without an ID3 tag or with an ID3v1 tag (which‘s appended at the end of the file) |
FF FE | .. | 0 | Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format | |
FF FE 00 00 | .... | 0 | Byte-order mark for text file encoded in little-endian 32-bit Unicode Transfer Format |
参考:
1、http://en.wikipedia.org/wiki/List_of_file_signatures
2、http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html#Exec
原文地址:http://blog.csdn.net/richerg85/article/details/39320549