Data Processing
The Archives Hub has implemented an innovative data processing workflow. The aims of our workflow are:
- To provide consistent processing of data (descriptions of archives and repositories)
- To allow for data to be revised at any time, with no version control issues
- To ensure a flexible approach that can be used for the wide variety of data we ingest
- To work effectively with large scale batch upload and individual description upload
- To provide contributors with an interface to manage their data
- To easily make changes over time in response to changing archival management systems and other external factors
We ingest XML (EAD for archive descriptions). This is processed through our pipelines, which include a number of XSLT scripts that perform various data checking, normalisation and enhancement tasks.
Our XSLT scripts can be grouped, so that we can create pipelines that works for various systems, such as Calm, AdLib and AtoM. We can also have global edits, applying to all data, and we can apply individual scripts if needed. This gives us a very flexible way to ingest, normalise and potentially enhance data.
Data workflow diagram