EAD for the Hub
Introduction
This is a guide to the EAD used by the Archives Hub. It provides:
- A guide to the Archives Hub implementation of EAD.
- Information for developing or improving an export from an archival system to contribute descriptions to the Archives Hub.
- Help with cataloguing, although it is geared towards exporting in EAD. We also have a cataloguing guide which does not require any knowledge of EAD.
We have EAD templates for minimal EAD requirements and standard EAD requirements. These may be useful if you want to develop an export.
To find out more about exporting from archival management systems please see: Archive Systems Export.
You may also wish to consult our information on Hub mandatory fields.
- What does the Archives Hub require?
- The anatomy of an EAD Description
- Terms Used
- <eadheader>
- <archdesc> <did>
- <archdesc>
- <controlaccess>
- <components>
- http links
What does the Archives Hub require?
The Archives Hub currently works with EAD2002. Our EAD is compliant with the DTD.
EAD is very permissive, and there are a large number of tags that you can potentially use. It is important to be aware that the Hub does not display the content of all tags available within EAD. We have listed those that we require or that we recommend here, along with any additional attribute or content requirements.
This page describes:
Status: Whether the field is mandatory.
Requirements: What the Archives Hub requires in order to process a description (Hub Standard EAD).
Hub Processing: Data processing work that the Hub will or can carry out.
Further Information: Other useful information.
The Hub 'pipelines' clean up and 'normalise' the data. We can often process data in order to ensure that it meets requirements, so please be aware that you may not need to meet all requirements with the submitted data. It is best if we talk to you directly about your data and what we can do with the data on ingest, as everybody's data is different.
You can include content within other tags and attributes that are not listed if they are appropriate for your data. However, the Hub does not display the content of all possible EAD tags so you would need to talk to us if there is an issue with data not being displayed.
In some instances, we cannot display content in the way that you might prefer because we have to find a balance between consistency, efficiency, interoperability and different contributors' cataloguing practices.
Contributors usually use an Archon code for their repository code. These can be found through the 'Find an archive' search in TNA's Discovery database, where you can also apply for a code if you do not have one.
If you are submitting a description of an Online Resource, which is a website describing archives, often a digitised collection of aggregated content, you do not need an Archon Code.
The Contents of an EAD Description
An EAD document has the following layout:
<ead> (root tag)
<eadheader> (metadata about the EAD description)
<archdesc> (descriptive data)
<did> (core data within the descriptive data)
<controlaccess> (index terms)
<dsc> (wrapper for subordinate components if included)
<c> (repeatable component wrapper)
Terms used
unit of description = any level of description, from a collection or fonds through to a subfonds, series, subseries, file or item within an archival collection.
tag content = the content of the tag rather than any attributes or attribute values. So <unitidate normal="1900/1920">Early 20th century</unitdate> has the content of Early 20th century and the attribute value of 1900/1920.
<c level="{value}"> = an indication of what is required as content - the curly brackets are not part of the content.
@ is used to signify an attribute name, e.g. @level means the 'level' value for an archival description.
Examples are in italics and indicate the preferred or usual formatting for Hub EAD.
Header <eadheader>
<eadheader> @repositoryencoding
Status: mandatory
Requirement: a repository encoding value
Hub processing: We will add this attribute if required. We will assume that the
repository code being used is Archon if the encoding value is not provided. If you use ISIL, then
this needs to be the encoding value. You will need to talk to us if you use other recognised
repository identifiers.
<eadheader repositoryencoding="archon"
<eadheader repositoryencoding="isil">
Finding aid identifier <eadid>
Status: mandatory
Requirement: The <eadid> requires three attributes - the countrycode,
mainagencycode and identifier
Hub processing: The Hub will populate the <eadid> tag content, the
@countrycode, @mainagencycode and @identifier, and will also provide the @url attribute values as
long as we have the required information in the <unitid> (archive unit identifier). This means
that content you may have in the <eadid> will be overwritten during our processing with Hub
generated values.
Further information:
- countrycode: uppercase ISO 3166-1 code (GB, IM)
- mainagencycode: the Archon code for the agency that created the description, excluding leading zeros. (Usually the same as the agency that provides intellectual access to the content, but may differ)
- identifier: a reference for the finding aid. Cannot be more than 256 characters and cannot end with a trailing slash
- The content of the tag comprises the countrycode, mainagencycode and identifer.
- The content must not contain: # or % characters.
The Hub formatting will include a hyphen between the codes and the identifier, to ensure uniqueness:
{countrycode}{mainagencycode}{hyphen}{identifier}
<eadid countrycode="GB" mainagencycode="1234" identifier="ms ab">gb1234-msab</eadid>
Finding aid title <titleproper>
Status: mandatory
Requirements: The titleproper cannot be a single word, it cannot be less than 10
characters. It must be unique within a repository.
Hub processing: We will add a <titleproper> if it is not present, using the
content of the <unittitle>
Further information:
- We recommend not more than 500 characters.
- The <titleproper> is typically the same as the title (unittitle) of the top-level unit of description
Finding aid creation <creation>
Status: non mandatory, but recommended
Requirements: None
Hub processing: No actions
Further information: We recommend the use of <creation> within
<profiledesc> to provide information about the creation of the EAD description. This field can
also include a <date> tag.
Finding aid langauge <langusage>
Status: Non mandatory
Requirements: None
Hub processing: The Hub will add 'English' as the language of the finding aid if
language is not already present.
Finding aid revisions <revisiondesc>
Status: Non mandatory, but recommended to record revisions to the description.
Requirements: None.
Hub processing: No actions
Other tags within the <eadheader>
The Hub will store any other content within other tags, such as <publicationstmt> but the information may not necessarily be utilised or displayed on our website.
Archive Description core fields <archdesc> <did>
Status: mandatory
Requirements: Must include <unitid>, <unitititle>, <unitdate>,
<language>, <extent>, <repository>. Fields within the <did> cannot
include paragraph tags.
Hub processing: see below, under individual tags
Archive Description Component core fields <c> <did>
Status: Mandatory.
Requirements: Must include <unitid>, <unittitle>
Level of Description @ level
Status: Mandatory at top level. Recommended at lower levels.
Requirements: Controlled level values (class, collection, file, fonds, otherlevel,
recordgrp, series, subfonds, subgrp, subseries). <otherlevel>, which can only be used at lower
levels, must have an other level value, e.g. subcollection, section, subsection, subfile, piece,
box, folder
Hub processing: Variations to controlled values are usually corrected,
e.g.SubSeries, sub-series are converted to subseries, which is the EAD controlled term. Terms
such as subsubsub-series are converted to subseries, as the Hub does not require values beyond one
'sub' level and it is the <c> tags rather than level values that control the hierarchy.
<archdesc level="fonds">
<c level="otherlevel" otherlevel="piece">
Further Information
- If you omit level values, your descriptions may not be included in a filter by level provided on the Hub website.
- Usually the value is 'fonds' or 'collection' at the top level of description, but you can provide descriptions at other levels
- Please talk to us about additional otherlevel values if you need them.
Identifier <unitid>
Status: Mandatory.
Requirements: All units of description must have a unique identifier. This is
vital for successful ingest into the Hub system. Hub EAD requires three attributes within the
<unitid> - the countrycode, repositorycode and identifier.
- countrycode: uppercase ISO 3166-1 code (GB, IM)
- repositorycode: usually the Archon code for the agency that holds the material, excluding leading zeros (typically the same as the agency that created the finding aid)
- identifier: a local reference. Not more than 500 characters and cannot end with a trailing slash
Hub processing:
We can work with <unitid> entries that differ from the Hub EAD requirements, so, for example,
we can add the identifier attribute if it is within the tag content and we can remove the country
and repository codes from the tag content, but we do need a repository code value and the
unique local identifier to be present.
Further Information
The tag content comprises just the local reference:
<unitid countrycode="GB" repositorycode="1234" identifier="MS AB ">MS AB</unitid>
The content must not contain: # or % (other non-standard characters may cause issues).
Permitted characters in identifiers
Space
The unreserved characters in RFC 3986 (URI
syntax):
A-Z, a-z
0-9
- (hyphen/minus) U+002D
_ (underscore) U+005F
. (period) U+002E
~ (tilde) U+007E
Punctuation widely used in reference codes:
(slash/solidus) U+002F
( (left parenthesis) U+0028
) (right parenthesis) U+0029
[ (left square bracket) U+005B
] (right square bracket) U+005D
: (colon) U+003A
; (semi-colon) U+003B
' (apostrophe) U+0027
, (comma) U+002C
Former and Alternative Reference Codes
We can accept former and alternative reference codes using the label attribute to identify 'former'
and 'alternative'.
<unitid countrycode="GB" repositorycode="1234" identifier="MS X
1" label="former">MS X 1</unitid>
<unitid countrycode="GB" repositorycode="1234" identifier="GaP"
label="alternative">GaP</unitid>
NOTE: Within Calm the 'Alt Ref' is exported as 'former reference'. We can talk to you about how to treat this on ingest. It can be treated as the main reference as long as it is always present and unique.
Title <unittitle>
Status: Mandatory at all levels except 'item' level, although highly recommended.
Requirements: Unique at collection level. It is preferable to exclude the date, as
this is provided in a separate tag.
Hub processing: Repository code is removed if it is present. The title is flagged
up if it is just one word.
Further information
- Only one <unittitle> can be included
- At the top level the title should be more than a single word, to ensure that it is effectively descriptive. It should not be just the name of creator, as this is not a title.
- We recommend 500 characters maximum
- If you don't provide titles at item level, the display of your descriptions may not be optimal, particularly in other systems, and your descriptions may not be valid for some scenarios.
Dates <unitdate>
Status: Mandatory at the top level. Recommended for all levels of description. The
@normal is not mandatory but recommended, as it allows for searching by date.
Requirements: The normalised date, if provided, must be compliant with ISO 8601.
Ongoing dates use '9999' to comply with the YYYY format.
Each date or date range should be enclosed within separate <unitdate> tags.If you have two
date ranges, e.g. 1900-1910 and 1915-1920 use two <unitdate> entries or alternatively put them
into one <unitdate> but with a single normalised value of 1900/1920.
Hub processing: No action on display dates. We may be able to convert
non-standard normalised dates, depending on the patterns used. We convert hyphens to slashes, e.g.
1900-1950 to 1900/1950. We will reject descriptions where the normalised date is invalid and we
cannot modify it, e.g. 109/1950, 1950/1930, ?1950/1980, 1950-01-35
<unitdate normal="1900/1925">Early 20th century</unitdate>
<unitdate normal="1900-01-01/1905-12-31>1 January 1900 - 31 December
1905</unitdate>
<unitdate normal="1952/9999">1952 - ongoing</unitdate>
Name of Repository <repository>
Status: Mandatory at the top level
Requirements: Should be consistent and must only contain the repository name
Hub processing: We should be able to remove extraneous information from this
field, and we can ensure that the repository name is always consistent if the preferred name is
supplied.
Extent <physdesc><extent>
Status: Mandatory at the top level
Requirements: No requirements, but consistent use of measurements and units
is recommended.
Hub processing: No actions
NB: The Hub will also display the content of <dimensions>, <genreform> and <physfacet>. See the EAD tag library for more about the use of these tags.
Name of Creator <origination>
Status: Non mandatory. Highly recommended at the top level.
Requirements: Separate names with consistent dividers. Ideally multiple names
should be in separate <origination> tags.
Hub processing: We may be able to split names up where more than one name is given
within one <origination> tag. This will depend upon the form and dividers used.
Further information
- We recommend including a <persname>, <famname> or <corpname> tag to indicate the type of agent.
- We recommend following standard convention, and entering the surname as the initial element.
- It is helpful to include <persname> <corpname> and <famname> tags to identify the type of entity
- You can include the role attribute to indicate the role of the agent, such as 'creator' or 'collector', although this is not utilised on the Hub at the moment.
- If you include a VIAF identifier or similar, use the @authfilenumber attribute:
<origination><persname authfilenumber="http://viaf.org/viaf/35723133/">Wordsworth, William, 1770-1850, poet</persname></origination> - We highly recommend including the creator name as an index term for helping with discovery of your descriptions (see <controlaccess>). It is preferable to enter the name within the <origination> area in the same way as you would enter it as an index term, so that the names match and there is less room for ambiguity.
Language of Material <langmaterial>
Status: Mandatory at the top level.
Requirements: Each language should be within a <language> tag, with the
langcode attribute used for the ISO code value (three
letters, lower case).
If you have descriptive information, it should be within <langmaterial> and not
<language>
Hub processing: We will add the ISO code if it is not present. We can add English
as the default language on request.
<langmaterial>
<language langcode="eng">English</language>
<language langcode="fre">French</language>
</langmaterial>
<langmaterial>Some letters are in German without English translations.
<language langcode="eng">English</language>
<language langcode="ger">German</language>
</langmaterial>
Digital Archival Objects <dao>
Status: Non mandatory
Requirements: If <dao> has an @href attribute, it must include 'http' or
'https'. <dao> may be located outside of the <did>
Hub processing: Images will be displayed within the description, re-sized as a
thumbnail with a link to the full resolution image. Links (e.g. PDF, Word, HTML) will be displayed
within the description, to take the user to the digital content.
The Archives Hub does not use <daogrp>. All instances will be converted to single <dao> instances. Note that <daogrp> is deprecated in EAD3.
Descriptive Information <archdesc>
There are a large number of fields that can be used within the <archdesc> tag to describe the materials. All must use <p> tags to indicate paragraphs. We can add <p> tags if they are absent, but if they are being used in a way that is inconsistent, we may ask you to correct this.
Try to avoid duplicating <p> tags or including self-closing <p/> tags.
<head> tags can be used to provide headings, but they will not be displayed on the Archives Hub as we need to provide a consistent display across all of the data.
Where <archref> is used to reference separate and related materials, if it has a URL included, the @href attribute must include http or https. If the link is to another Archives Hub description, it should to use our 'direct link' URIs e.g.
See also <archref href="http://archiveshub.jisc.ac.uk/data/gb123-mcs">The Papers of McConnell and Son</archref>
Scope and Content <scopecontent>
Status: Mandatory at the top level
Requirements: No requirements.
Hub processing: http links will be enabled. Tags may be tidied up, e.g. redundant
paragraph tags.
Arrangement <arrangement>
Status: Non mandatory
Requirements: None
Hub processing: We will try to tidy up list markup if it is incorrect, e.g.
punctuation outside of wrapper tags
Further Information: <ul> and <li> are not valid for lists, as they are not EAD tags.
Biographical and Administrative History <bioghist>
Status: Non mandatory
Requirements: No requirements
Hub processing: We will try to tidy up formatting such as lists and paragraphs.
Further Information: The <bioghist> may include an ID from a separate name authority database, and the content from that database would need to be brought into the EAD to populate the field.
Access Conditions <accessrestrict>
Status: Mandatory at top level
Requirements: None
Hub processing: http URLs will be marked up as links. We can add boiler plate text
if requested.
Other Finding Aids <otherfindaid>
Status: Non mandatory
Requirements: No requirements
Hub processing: We can add template text, or possibly link to individual
descriptions in contributors' catalogues where there is a consistent reference code pattern.
Archival Reference <archref>
Status: Non-mandatory
Requirements: If a link is required, must include an href attribute. Links to Hub
records must use the /data/ URIs
Hub processing: May be able to correct links, depending upon the
error
<p>A full archival history is given in the <archref href="http://archiveshub.jisc.ac.uk/data/gb133-egr">collection-level description</archref> </p>
Reference <ref>
Status: Non mandatory
Requirements: Needs a target collection or component
Hub processing: The link will take the user to the appropriate component if it has
the correct URI or full reference. We do not offer more granular linking, e.g. to a field within a
component.
<p>See also <ref linktype="simple" target="GB-237-Coll-97-CW9">GB 237 Coll-97/CW9 </ref> Topographical Notes by W.J. Watson </p>
NB. EAD3 replaces the use of <archref> and <ref> for linking with just <ref> so an option for creating links would be:
<p>See also <archref><ref href="http://archiveshub.jisc.ac.uk/data/gb237-Coll-97">The Carmichael-Watson Collection</ref></archref></p>
and
<p>See also <archref><ref target="GB-237-Coll-97-CW9">Topographical Notes by W.J. Watson</></archref></p>
Other commonly used <archdesc>fields
All must include <p> tags, but there are no specific content requirements.
<custodhist> for information about provenance
<acqinfo> for information about the immediate source of acquisition
<appraisal> for information about any appraisal processes
<accruals> for information about expected accruals
<userestrict> for information about limitations on reproduction
<phystech> for information about physical characteristics or technical requirements affecting
use of the materials
<altformavail> for the existence/location of copies
<originalsloc> for the existence/location of originals
<bibliography> for information on publications based on or significantly utilising the
materials
<otherfindaid> for information about additional finding aids, such as more detailed lists,
often including http links to online catalogues
<relatedmaterial> for information about material related by provenance
Hub data processing
We will try to add <p> tags where appropriate, and signal any fields that have very long
paragraphs, where text may be better broken up into more paragraphs.
Index Terms <controlaccess>
Status: Non mandatory
Requirements: The <controlaccess> tag must be used to wrap index terms. Each
index term must be within: <persname>, <corpname>, <famname> <subject>,
<geogname>, <title>, <genre> or <function>. Dividers used should be
consistent. So for example, personal names always have the same format: surname{comma}
forename{comma} dates {comma} epithet.
Further Information:
The Archives Hub has an enhanced markup for index terms using <emph> tags to separate out elements of a term. Our processing will tidy up markup that uses <emph> tags. However, use of this markup is not necessary for data ingest.
Rules and Source @rules @source
Status: Non mandatory but recommended.
Requirements: Recognised values for rules or source. Values should be lower case,
with no spaces
Hub processing: We may correct values e.g. "TNA" to "tna" or "Library of Congress"
to "lcsh". We may report back to you if the source values you use are not recognised.
Personal Name <persname>
Requirements: Personal names should use the <persname> tag, one for each
name.
If rules are used they should be identified, e.g. rules="ncarules".
<persname rules="ncarules">Smith, John, 1938-1994, Labour
politician</persname>
<persname rules="ncarules">Addison, Christopher, 1869-1951, 1st Viscount Addision,
statesman</persname>
<persname>Courtney, Lady Catherine, 1847-1929, nee Potter, wife of 1st Baron Courtney of
Penwith</persname>
<persname rules="ncarules" source="tna">White, Barry, fl
1835-1845</persname>
Family Name <famname>
Requirements: Family names should use the <famname> tag, one for each name.
If rules are used they should be identified, e.g. rules="ncarules".
<famname rules="ncarules">Hebden family</famname>
<famname rules="ncarules">Wilberforce family, Kingston upon Hull, East Riding of
Yorkshire</famname>
Corporate Name <corpname>
Requirements: Corporate names should use the <corpname> tag, one for each
name.
If rules are used they should be identified, e.g. rules="ncarules".
<corpname rules="aacr2">Birmingham University (Department of
Anatomy)</corpname>
<corpname source="tna">Great Britain and Northern Ireland, Department of Trade and
Industry</corpname>
<corpname rules="ncarules">Royal National Institute of Blind
People</corpname>
Place Name <geogname>
Requirements: Place names should use the <geogname> tag, one for each
place.
If rules are used they should be identified, e.g. rules="ncarules".
If a source is used it should be identified, e.g. source="tgn" (Getty thesaurus),e.g.
source="geonames".
Place names should include the country, so that they are globally identifiable.
<geogname rules="ncarules">Congleton, Cheshire, England</geogname>
<geogname rules="ncarules">York, North Yorkshire, England</geogname>
<geogname>Canada</geogname>
<geogname>10 Downing Street, London, England</geogname>
Subject <subject>
Requirements: Subjects should use the <subject> tag, one for each topic.
If a source is used it should be identified, e.g. source="ukat".
<subject source="ukat">Agriculture</subject>
<subject source="lcsh">Glass blowing and working -- History -- 18th
century</subject>
Genre and Form <genreform>
Requirements: Use the <genreform> tag, one for each entry. It is preferable to take terms from a recognised source.
<genreform source="aat">CD-ROMs</genreform>
<genreform source="aat">dry collodion negatives</genreform>
<genreform source="tgm">Theatrical posters</genreform>
Publication Title <title>
Requirements: The title of a published work, optionally with the date.
<title rules="aacr2">The World Turned Upside Down <date>1973</date></title>
Description of Subordinate Components <dsc>
Status: Mandatory if providing lower level entries
We may convert numbered <c0n> tags to un-numbered tags, particularly if the numbered nesting is incorrect (e.g. <c01> followed by <c03>).
Components <c>
Status: Required to wrap each component
Requirements: Must be nested correctly in order for us to display the data
correctly. We recommend unnumbered, but we can ingest numbered <c0n> tags.
<dsc>
<c level="subfonds">....
<c level="series">....
<c level="item">....
</c>
</c>
<c>
</dsc>
Component level mandatory fields
Lower level entries require only the following fields in order to be valid:
<c level="{value}"> or <c01 level={value]">
<unitid countrycode="GB" repositorycode="{nnn}" identifier="{local
reference}">{local reference}</unitid>
<unittitle>{title of unit}</unittitle>
</c>
We recommend including <unitdate> at all levels, and it is also common to have a short <scopecontent>
Any fields included at the top level within the <archdesc> can be included at the component level, including index terms.