start page | rating of books | rating of authors | reviews | copyrights

Appendix E. Common Content Encodings

In an ideal world, the only character encoding (or, loosely, "character set") that you'd ever see would be UTF-8 (utf-8), and Latin-1 (iso-8859-1) for all those legacy documents. However, the encodings mentioned below exist and can be found on the Web. They are listed below in order of their English names, with the lefthand side being the value you'd get returned from $response->content_charset. The complete list of character sets can be found at http://www.iana.org/assignments/character-sets.

Value	Encoding
us-ascii	ASCII plain (just characters 0x00-0x7F)
asmo-708	Arabic ASMO-708
iso-8859-6	Arabic ISO
dos-720	Arabic MSDOS
windows-1256	Arabic MSWindows
iso-8859-4	Baltic ISO
windows-1257	Baltic MSWindows
iso-8859-2	Central European ISO
ibm852	Central European MSDOS
windows-1250	Central European MSWindows
hz-gb-2312	Chinese Simplified (HZ)
gb2312	Chinese Simplified (GB2312)
euc-cn	Chinese Simplified EUC
big5	Chinese Traditional (Big5)
cp866	Cyrillic DOS
iso-8859-5	Cyrillic ISO
koi8-r	Cyrillic KOI8-R
koi8-u	Cyrillic KOI8-U
windows-1251	Cyrillic MSWindows
iso-8859-7	Greek ISO
windows-1253	Greek MSWindows
iso-8859-8-i	Hebrew ISO Logical
iso-8859-8	Hebrew ISO Visual
dos-862	Hebrew MSDOS
windows-1255	Hebrew MSWindows
euc-jp	Japanese EUC-JP
iso-2022-jp	Japanese JIS
shift_jis	Japanese Shift-JIS
iso-2022-kr	Korean ISO
euc-kr	Korean Standard
windows-874	Thai MSWindows
iso-8859-9	Turkish ISO
windows-1254	Turkish MSWindows
utf-8	Unicode expressed as UTF-8
utf-16	Unicode expressed as UTF-16
windows-1258	Vietnamese MSWindows
viscii	Vietnamese VISCII
iso-8859-1	Western European (Latin-1)
windows-1252	Western European (Latin-1) with extra characters in 0x80-0x9F


D. Language Tags		F. ASCII Table