Cork encoding

The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west and east-European languages with the Latin alphabet.[2]

Details

In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with `\usepackage[T1]{fontenc}`, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX the Unicode is fully supported and the 8-bit font encodings are obsolete.

Character set

Cork encoding
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_
0
`
0060
´
00B4
ˆ
02C6
˜
02DC
¨
00A8
˝
02DD
˚
02DA
ˇ
02C7
˘
02D8
¯
00AF
˙
02D9
¸
00B8
˛
02DB

201A

2039

203A
1_
16

201C

201D

201E
«
00AB
»
00BB

2013

2014
ZWSP
200B
[a]
2080
ı[b]
0131
ȷ[b]
0237

FB00

FB01

FB02

FB03

FB04
2_
32

0020
!
0021
"
0022
#
0023
\$
0024
%
0025
&
0026

2019
(
0028
)
0029
*
002A
+
002B
,
002C
-
002D
.
002E
/
002F
3_
48
0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003A
;
003B
<
003C
=
003D
>
003E
?
003F
4_
64
@
0040
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
5_
80
P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
[
005B
\
005C
]
005D
^
005E
_
005F
6_
96

2018
a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006A
k
006B
l
006C
m
006D
n
006E
o
006F
7_
112
p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007A
{
007B
|
007C
}
007D
~
007E
[c]
8_
128
Ă
0102
Ą
0104
Ć
0106
Č
010C
Ď
010E
Ě
011A
Ę
0118
Ğ
011E
Ĺ
0139
Ľ
013D
Ł
0141
Ń
0143
Ň
0147
Ŋ
014A
Ő
0150
Ŕ
0154
9_
144
Ř
0158
Ś
015A
Š
0160
Ş
015E
Ť
0164
Ţ
0162
Ű
0170
Ů
016E
Ÿ
0178
Ź
0179
Ž
017D
Ż
017B
Ĳ
0132
İ
0130
đ
0111
§
00A7
A_
160
ă
0103
ą
0105
ć
0107
č
010D
ď
010F
ě
011B
ę
0119
ğ
011F
ĺ
013A
ľ
013E
ł
0142
ń
0144
ň
0148
ŋ
014B
ő
0151
ŕ
0155
B_
176
ř
0159
ś
015B
š
0161
ş
015F
ť
0165
ţ
0163
ű
0171
ů
016F
ÿ
00FF
ź
017A
ž
017E
ż
017C
ĳ
0133
¡
00A1
¿
00BF
£
00A3
C_
192
À
00C0
Á
00C1
Â
00C2
Ã
00C3
Ä
00C4
Å
00C5
Æ
00C6
Ç
00C7
È
00C8
É
00C9
Ê
00CA
Ë
00CB
Ì
00CC
Í
00CD
Î
00CE
Ï
00CF
D_
208
Ð/Đ[d]
00D0
Ñ
00D1
Ò
00D2
Ó
00D3
Ô
00D4
Õ
00D5
Ö
00D6
Œ
0152
Ø
00D8
Ù
00D9
Ú
00DA
Û
00DB
Ü
00DC
Ý
00DD
Þ
00DE
SS[e]
1E9E
E_
224
à
00E0
á
00E1
â
00E2
ã
00E3
ä
00E4
å
00E5
æ
00E6
ç
00E7
è
00E8
é
00E9
ê
00EA
ë
00EB
ì
00EC
í
00ED
î
00EE
ï
00EF
F_
240
ð
00F0
ñ
00F1
ò
00F2
ó
00F3
ô
00F4
õ
00F5
ö
00F6
œ
0153
ø
00F8
ù
00F9
ú
00FA
û
00FB
ü
00FC
ý
00FD
þ
00FE
ß
00DF

Notes

• Hexadecimal values under the characters in the table are the Unicode character codes.
• The first 12 characters are often used as combining characters.
1. ^ 0x18 is just a "trailing zero", used to compose or (or arbitrary smaller quantities) out of percent sign (%).
2. ^ a b Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
3. ^ 0x7F is the hyphenation character (not really a soft hyphen).
4. ^ 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
5. ^ 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.

Supported languages

The encoding supports most European languages written in Latin alphabet. Notable exceptions are:

Languages with slightly suboptimal support include:

References

1. ^ a b Petrlik, Lukas (1996-06-19). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
2. ^ Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, Volume 11 (Issue 4): 514–516
3. ^