Manual encoding
Special characters
Some characters, such as é and €, need special treatment to appear correctly in your EAD files. How you encode these will depend on which character encoding you are using. Different character encodings support different sets (“repertoires”) of special characters; some may be quite restricted, others very comprehensive. No matter which encoding you're using, you shouldn't need to encode question mark/exclamation mark. You should be able to just type them as characters using the keyboard.
The preferred encoding for the Archives Hub is UTF-8, but the Hub should accept, and work correctly with, input data provided using: US-ASCII; ISO-8859-1; Windows-1252; UTF-8. Data output from the Hub uses the UTF-8 encoding, whatever the encoding of the input data.
Encoding for special characters can be broadly broken into two categories: numeric references (eg é for é) and character entities (eg é for é). Please avoid using character entities such as é, as these are not supported by the EAD DTD.
You may also be able to enter special characters directly from your keyboard or a character map program. Whether you choose to do this or use the numeric references will depend on what is supported by the character encoding you're working in, and which works best with your workflows. It is never wrong to use the XML numeric references, and can be the safest/easiest option.
You can find the numeric references here: ISO 8879: Numeric and special graphical entities.
If you are using an XML editor, please get into the habit of doing “well-formedness” checks which should detect problems like unescaped ampersands. Ideally, validate against the EAD DTD before sending data to the Hub.
XML predefined entities
You must always use the XML predefined entities for:
- ampersand: &
- less-than: <
- greater-than: >
Take care with ampersands when copying links and other data.
Unescaped ampersands are a common source of problems, especially within URIs. For example if your browser displays a link like
http://example.org/doc?a=123&b=xyz
When you put that into EAD as an extref link, you have to use the & entity:
http://example.org/doc?a=123&b=xyz
You only need to escape quotes and apostrophes in attribute values. If you have an attribute value that requires an apostrophe or quotation, please encode them using ' " eg
<eadid identifier="chetham'sarchive">Chetham's Archive</eadid>
Hub exports
The Archives Hub exports data in UTF-8. This means that you will probably see that it has the characters present directly, when they might have been numeric references in your original encoding.
If you prefer to use US-ASCII or ISO8859-1 (or your editing tools require those encodings), we can convert the docs back to US-ASCII, which should include the numeric references.
Please be aware that changing the character encoding of a document is not just a matter of changing the value of the encoding attribute in the declaration: you need to be sure that your software is actually resaving the document with the new encoding.
- General
- Layout
- Links/external references
- Special characters - you are here