WikiLinks

WikiLinks – Guest Blog by Andy Young

Between March and June 2014 I conducted a piece of social media-oriented research on behalf of the Archives Hub, the primary purpose of which was to measure the impact of adding links from specific Wikipedia articles featuring Hub content on the traffic that comes into the Hub website. As well as providing the Hub administrators – and, indeed, the profession as a whole – with a gauge as to whether the amount of time invested in creating links is worthwhile when compared to the benefits of impact, this research benefitted me personally in that it allowed me the opportunity to potentially earn credits on the Archives & Records Association’s Registration Scheme, under the ‘Contributions to the profession’ category.

The first phase of the study involved me identifying twenty archival collections listed in the Hub, with no existing links to related Wikipedia pages, which I could treat as measurable research subjects. This was done simply by entering specific Hub collection level descriptions into the Wikipedia search engine. (If a link to the Hub had already been created, I eliminated that particular collection from the study.) In order to achieve a fair and balanced piece of research, I selected collections of a relatively similar size and status, and avoided those relating to any significant public events running concurrent to, or immediately prior to, the commencement of the research, i.e. local elections in England, the World Cup. My feeling was that such collections could have been subject to closer scrutiny from researchers while the study was underway, which, in turn, would have resulted in an unexpected increase in Hub-searching activity. This, in essence, would have undermined the credibility of the study. I also made sure that the Wikipedia pages I utilised didn’t already include links to the collection-holding repositories, as this could potentially sway researchers away from clicking the newly-created links to the Hub descriptions, thereby affecting the accuracy of research.

The twenty collections selected, along with their corresponding Wikipedia links, are shown in the table below.

table showing list of Hub collections with wikipedia links
List of Collections used in the study with the Wikipedia URLs

Once the Hub collections and related Wikipedia pages had been identified, I then added new links to the individual pages using Wikipedia’s built-in editing tool. In the interests of consistency, I embedded each new link in the ‘External Links’ section on each of the pages I modified. I then used Google Analytics, in conjunction with an Excel spreadsheet, to collate and record Hub traffic data for each individual collection for the twelve-week period prior to the start of the study, specifically from the 22nd December, 2013 to the 15th March, 2014. This was done in order to enable me to generate a measurement of the overall impact of the newly-created links on incoming Hub traffic. The cumulative results for each collection, for the twelve-week period prior to the commencement of the study, are shown below.

table showing page views for collections prior to wikipedia links
Page views for collections in a 12 week period prior to the creation of the Wikipedia links

Over the course of the next twelve weeks, from the 17th March, 2014 to the 7th June, 2014, I used Google Analytics once again to monitor incoming Hub traffic, with a reading being taken at the end of every fourth week in order to identify any significant traffic fluctuations or changes. The four-week hit statistics for each of the twenty collections are shown in the table below.

table showing hits for hub collections when the links were on wikipedia
Hits for Hub collections during the Wikipedia study

At the end of the twelve-week research period it was evident from the accumulated data that fourteen of the twenty collections had each experienced an increase in traffic compared to the previous twelve-week period. Indeed, of the fourteen, two collections, namely the Ramsay MacDonald Papers and the London South Bank University Archives, had each received well in excess of 100 additional hits compared to the pre-link period. Of the remaining six collections, only the Sadler’s Wells Theatre Archive had decreased in hits significantly, down 109 from the previous period. Although it isn’t possible to say definitively why this decrease occurred, it may have been due to the fact that at some point during the research, a new link had been added to the Sadler’s Wells Theatre Archive Wikipedia page giving researchers the option to examine ‘Archival material relating to Sadler’s Wells Theatre listed at the UK National Archives.’ Taking this modification into account, it seems fair to suggest that any researchers interested in the Sadler’s Wells Theatre material may have been drawn to this link description rather than the newly-added link to the Hub description essentially because it makes mention of the country’s principal archival repository, TNA.

The cumulative number of hits for each of the twenty collections during the research period are presented in the table below. This table also shows the positive and negative numerical differences in hits for each of the collections compared to the twelve-week period prior to the start of the research.

table showing cumulative hits for collections with positive and negative changes shown
Cumulative hits for collections with positive and negative differences shown

Conclusion

This piece of research has demonstrated that the simple task of linking online archival descriptions to a popular social media reference tool such as Wikipedia can yield extremely positive results. It has shown, moreover, that there are clear benefits, both for the archival repository/aggregator and the individual researcher, when catalogue data is linked and shared. Not only that, it has proven that a successful outcome can be achieved in a relatively short space of time, and, truth be told, with only a small amount of physical effort. The process of checking whether links from specific Hub collections already existed in Wikipedia and then adding them to the website if they didn’t, took little more than three hours to complete, and, for the most part, basically involved me copying data from one website and pasting it onto another. Ultimately, the sheer simplicity of this exercise, coupled with the knowledge that interest in the vast majority of the Hub collections increased as a result of the Wikipedia editing, confirms, to my mind at least, that archive services the world over – especially those blessed with a healthy number of volunteers – would benefit from embarking on linked data projects of this nature. After all, it’s like Benjamin Franklin said, “An investment in knowledge always pays the best interest.”

Kettle’s Yard Archive

Archives Hub feature for September 2014

Image of Kettle's Yard House
Kettle’s Yard House, University of Cambridge

Kettle’s Yard – A Way of Life

Kettle’s Yard is a unique and special place.  It is so much more than a house, a museum or a gallery, and it invariably leaves a lasting impression with those who visit.

Between 1958 and 1973, Kettle’s Yard was the home of Jim and Helen Ede. In the 1920s and 30s, Jim had been a curator at the Tate Gallery in London. It was during this time that he formed friendships with artists and other like-minded people, which allowed him to gather a remarkable collection of works by artists such as Ben and Winifred Nicholson, Alfred Wallis, Christopher Wood, David Jones and Joan Miro, Henri Gaudier-Brzeska, Constantin Brancusi, Henry Moore and Barbara Hepworth.  Ede also shared with many of his artist friends a fascination for beautiful natural objects such as pebbles, weathered wood, shells or feathers, which he also collected.

Jim carefully positioned artworks alongside furniture, glass, ceramics and natural objects, with the aim of creating a perfectly balanced whole. His vision was of a place that should not be

“an art gallery or museum, nor … simply a collection of works of art reflecting my taste or the taste of a given period. It is, rather, a continuing way of life from these last fifty years, in which stray objects, stones, glass, pictures, sculpture, in light and in space, have been used to make manifest the underlying stability.”

Image of Jim Ede's bedroom table
Jim Ede’s bedroom table – Kettle’s Yard, 
University of Cambridge. Photo: Paul Allitt.

Jim originally envisaged making a home for his collection in quite a grand house, but unable to find a suitable property, he opted instead to remodel four derelict 19th century cottages and convert them into a single house.

Kettle’s Yard was conceived with students in mind, as ‘a living place where works of art could be enjoyed . . . where young people could be at home unhampered by the greater austerity of the museum or public art gallery.’  Jim Ede kept ‘open house’ every afternoon of term, personally guiding his visitors around his home. This experience is still faithfully recreated as visitors ring the bell at the front door, and are welcomed into the house.

Image of Jim Ede
Jim Ede at Kettle’s Yard – Kettle’s Yard, 
University of Cambridge

In 1966 Jim gave the house and its contents to the University of Cambridge, though he continued to occupy and run it until 1973. In 1970, the house was extended, and an exhibition gallery added to ensure that there would always be a dynamic element to Kettle’s Yard, with space for contemporary exhibitions, music recitals and other public events.

The archive

If Kettle’s Yard is the ultimate expression of a way of life developed over 50 years and more, the archive adds an extra dimension by documenting the rich story of how that philosophy evolved.  At its core are Jim Ede’s personal papers, which chart a wide range of influences throughout his life, from his experience of World War I, through the ‘open house’ the Ede’s kept in Hampstead through the late 1920s and early 1930s and the vibrant set who attended their parties; the weekend retreats for servicemen on leave from Gibraltar at the Ede’s house in Tangier at the end of World War II; the ‘lecturer in search of an audience’ who travelled to the US in the early 1940s; the prolific correspondence not just with artist friends, but figures such as T E Lawrence; and the development of Kettle’s Yard and its collections.

Thanks to the support of the Newton Trust, we are now half way through a 2-year project to improve access to the archive and support research by producing a digital catalogue of the collections, putting in place proper preservation strategies, and establishing procedures for public access. This work builds on the foundations laid by the dedicated archive volunteers, who continue to work with us.

We have started out by publishing a high-level description of the Ede papers on the Archives Hub [http://archiveshub.ac.uk/data/gb1759-ky/ede?page=1#id1308050], to which we will add more detail over the coming year.  The catalogue already includes detailed descriptions of c.120 letters Jim Ede received from the artist and writer David Jones between 1927 and 1971, and c. 200 from the collector and patron Helen Sutherland, from 1926 to 1964.   We will soon be adding correspondence with the artists Ian Hamilton Finlay and Richard Pousette-Dart, and the museum director Perry Rathbone; papers relating to Jim Ede’s lifelong mission to promote the work of Henri Gaudier Brzeska, and the establishment and running of Kettle’s Yard; and other small collections such as Helen Sutherland’s letters to the poet Kathleen Raine.

In another exciting development, Kettle’s Yard has now received backing from the Arts Council England Capital Investment Programme Fund to create a new Education Wing and carry out major improvements to the exhibition galleries.  The plans [http://www.kettlesyard.co.uk/development/index.php] include a purpose-built archive store and dedicated space for consulting and exhibiting archive material.

One recent addition to the archive is a letter that Jim Ede wrote in 1964, in response to a thank you note from an undergraduate who had visited Kettle’s Yard.  In typical style, Jim expresses concern about whether he really is providing pleasure to others through his endeavours at Kettle’s Yard, and draws strength from the expression of gratitude.  He ends the letter ‘Do come in as often as you like – the place is only alive when used’.

Image of letter from Jim Ede
“the place is only alive when used” – Kettle’s Yard Archive, University of Cambridge

This is very true of the house, but equally true of the archive – and hopefully everything we are doing to improve physical and intellectual access to the archives, and integrate it into all aspects of the Kettle’s Yard programme, will ensure that it is well used.

Frieda Midgley, Archivist
Kettle’s Yard, University of Cambridge

All images copyright Ketttle’s Yard, University of Cambridge, and reproduced with the kind permission of the copyright holder.

Micro sites: local interfaces for Archives Hub contributors

Background

Back in 2008 the Archives Hub embarked upon a project to become distributed; the aim was to give control of their data to the individual contributors. Every contributor could host their own data by installing and running a ‘mini Hub’. This would give them an administrative interface to manage their descriptions and a web interface for searching.

Five years later we had 6 distributed ‘spokes’ for 6 contributors. This was actually reduced from 8, which was the highest number of institutions that took up the invitation to hold their own data out of around 180 contributors at the time.

The primary reason for the lack of success was identified as a lack of technical knowledge and the skills required for setting up and maintaining the software. In addition to this,  many institutions are not willing to install unknown software or maintain an unfamiliar operating system. Of course, many Hub contributors already had a management system, and so they may not have wanted to run a second system; but a significant number did not (and still don’t) have their own system. Part of the reason may institutions want an out-of-the-box solution is that they do not have consistent or effective IT support, so they need something that is intuitive to use.

The spokes institutions ended up requiring a great deal of support from the central Hub team; and at the same time they found that running their spoke took a good deal of their own time. In the end, setting up a server with an operating system and bespoke software (Cheshire in this case) is not a trivial thing, even with step-by-step instructions, because there are many variables and external factors that impact on the process. We realised that running the spokes effectively would probably require a full-time member of the Hub team in support, which was not really feasible, but even then it was doubtful whether the spokes institutions could find the IT support they required on an ongoing basis, as they needed a secure server and they needed to upgrade the software periodically.

Another big issue with the distributed model was that the central Hub team could no longer work on the Hub data in its entirety, because the spoke institutions had the master copy of their own data. We are increasingly keen to work cross-platform, using the data in different applications. This requires the data to be consistent, and therefore we wanted to have a central store of data so that we could work on standardising the descriptions.

The Hub team spend a substantial amount of time processing the data, in order to be able to work with it more effectively. For example, a very substantial (and continuing) amount of work has been done to create persistent URIs for all levels of  description (i.e. series, item, etc.). This requires rigorous consistency and no duplications of references. When we started to work on this we found that we had 100’s of duplicate references due to both human error and issues with our workflow (which in some cases meant we had loaded a revised description along with the original description). Also, because we use archival references in our URIs, we were somewhat nonplussed to discover that there was an issue with duplicates arising from references such as GB 234 5AB and GB 2345 AB. We therefore had to change our URI pattern, which led to substantial additional work (we used a hyphen to create gb234-5ab and gb2345-ab).

We also carry out more minor data corrections, such as correcting character encoding (usually an issue with characters such as accented letters) and creating normalised dates (machine processable dates).

In addition to these types of corrections, we run validation checks and correct anything that is not valid according to the EAD schema, and we are planning, longer term, to set up a workflow such that we can implement some enhancement routines, such as adding a ‘personal name’ or ‘corporate name’ identifying tag to our creator names.

These data corrections/enhancements have been applied to data held centrally. We have tried to work with the distributed data, but it is very hard to maintain version control, as the data is constantly being revised, and we have ended up with some instances where identifying the ‘master’ copy of the data has become problematic.

We are currently working towards a more automated system of data corrections/enhancement, and this makes it important that we hold all of the data centrally, so that we ensure that the workflow is clear and we do not end up with duplicate slightly different versions of descriptions. (NB: there are ways to work more effectively with distributed data, but we do not have the resources to set up this kind of environment at present – it may be something for the longer term).

We concluded that the distributed model was not sustainable, but we still wanted to provide a front-end for contributors. We therefore came up with the idea of the ‘micro sites’.

What are Hub Micro Sites?

The micro sites are a template based local interface for individual Hub contributors. They use a feed of the contributor’s data from the central Archives Hub, so the data is only held in one place but accessible through both interfaces: the Hub and the micro site. The end-user performs a search on a micro site, the search request goes to the central Hub, and the results are returned and displayed in the micro site interface.

screenshot of brighton micro site
Brighton Design Archives micro site homepage

The principles underlying the micro sites are that they need to be:

•    Sustainable
•    Low cost
•    Efficient
•    Realistically resourced

A Template Approach?

As part of our aim of ensuring a sustainable and low-cost solution we knew we had to adopt a one-size-fits-all model. The aim is to be able to set up a new micro site with minimal effort, as the basic look and feel stays the same. Only the branding, top and bottom banners, basic text and colours change. This gives enough flexibility for a micro site to reflect an institution’s identity, through its logo and colours, but it means that we avoid customisation, which can be very time-consuming to maintain.

The micro sites use an open approach, so it would be possible for institutions to customise themselves, by manipulating the stylesheets. However, this is not something that the Archives Hub can support, and therefore the institution would need to have the expertise necessary to maintain this themselves.

The Consultation Process

We started by talking to the Spokes institutions and getting their feedback about the strengths and weaknesses of the spokes and what might replace them. We then sent out a survey to Hub contributors to ascertain whether there would be a demand for the micro sites.

Institutions preferred the micro sites to be hosted by the Archives Hub. This reflects the lack of technical support within UK archives. This solution is also likely to be more efficient for us, as providing support at a distance is often more complicated than maintaining services in-house.

The responders generally did not have images displayed on the Hub, but intended to in the future, so this needed to be taken into account. We also asked about experiences with understanding and using APIs. The response showed that people had no experience of APIs and did not really understand what they were, but were keen to find out more.

We asked for requirements and preferences, which we have taken into account as much as possible, but we explained that we would have to take a uniform approach, so it was likely that there would need to be compromises.

After a period of development, we met with the early adopters of the micro sites (see below) to update them on our progress and get additional requirements from them. We considered these requirements in terms of how practical they would be to implement in the time scale that we were working towards, and we then prioritised the requirements that we would aim to implement before going live.

The additional requirements included:

  • Search in multi-level description: the ability to search within a description to find just the components that include the search term
  • Reference search: useful for contributors for administrative purposes
  • Citation: title and reference, to encourage researchers to cite the archive correctly
  • Highlight: highlighting of the search term(s)
  • Links to ‘search again’ and to ‘go back’ to the collection result
  • The addition of Google Analytics code in the pages, to enable impact analysis

The Development Process

We wanted the micro sites to be a ‘stand alone’ implementation, not tied to the Archives Hub. We could have utilised the Hub, effectively creating duplicate instances of the interface, but this would have created dependencies.  We felt that it was important for the micro sites to be sustainable independent of our current Hub platform.

In fact, the Micro sites have been developed using Java, whereas the Hub uses Python, a completely different programming language. This happened mainly because we had a Java programmer on the team. It may seem a little odd to do this, as opposed to simply filtering the Hub data with Python, but we think that it has had unforeseen benefits. Namely, that the programmers who have worked on the micro sites have been able to come at the task afresh, and work on new ways to solve the many challenges that we faced. As a result of this we have implemented some solutions with the micro sites that are not implemented on the Hub.  Equally, there were certainly functions within the Hub that we could not replicate with the micro sites – mainly those that were specifically set up for the aggregated nature of the Hub (e.g browsing across the Hub content).

It was a steep learning curve for a developer, as the development required a good understanding of hierarchical archival descriptions, and also an appreciation of the challenges that come from a diverse data set. As with pretty much all Hub projects, it is the diverse nature of the data set that is the main hurdle. Developers need patterns; they need something to work with, something consistent. There isn’t too much of that with aggregated archives catalogues!

The developer utilised what he could from the Hub, but it is the nature of programming that reverse engineering of someone else’s code can be a great deal harder than re-coding, so in many cases the coding was done from scratch. For example, the table of contents is a particularly tricky thing to recreate, but the code used for the current Hub proved to be too complex to work with, as it has been built up over a decade and is designed to work within the Hub environment. The table of contents requires the hierarchy to be set out, collapsible folder structures, links to specific parts of the description with further navigation from there to allow the researcher to navigate up and down, so it is a complex thing to create and it took some time to achieve.

The feed of data has to provide the necessary information for the creation of the hierarchy, and our feed comes through SRU (Search/Retrieve via URL), which is a standard search protocol for Internet search queries using Contextual Query Language (CQL).  This was already available through the Hub API, and the micro sites application makes uses of SRU in order to perform most of the standard searches that are available on the Hub.  Essentially, each of the micro sites are provided by a single web application that acts as a layer on the Archives Hub.  To access the individual micro sites, the contributor provides a shortened version of the institution’s name as a sub-string to the micro sites web address.  This then filters the data accordingly for that institution, and sets up the site with the appropriate branding.  The latter is achieved through CSS stylesheets, individually tailored for the given institution by a stand-alone Java application and a standard CSS template.

Page Display

One of the changes that the developer suggested for the micro sites concerns the intellectual division of the descriptions. On the current Hub, a description may carry over many pages, but each page does not represent anything specific about the hierarchy, it is just a case of the description continuing from one page to the next. With the micro sites we have introduced the idea that each ‘child’ description of the top level is represented on one page. This can more easily be shown through a screenshot:

screenshot of table of contents from Salford Archives
Table of Contents of the Walter Greenwood Collection showing the tree structure

 

 

 

 

 

 

 

 

 

 

 

 

 

In the screenshot above, the series ‘Theatre Programmes, Playbills, etc’ is a first-level child description (a series description) of the archive collection ‘The Walter Greenwood Collection’.  Within this series there are a number of sub-series, the first of which is ‘Love on the Dole’, the last of which is ‘A Taste of Honey’. The researcher will therefore get a page that contains everything within this one series – all sub-series and items – if there are any described in the series.

screenshot of a page from Salford Archives
Page for ‘Theatre Programmes, Playbills, etc’ within the Walter Greenwood Collection

The sense of hierarchy and belonging is further re-enforced by repeating the main collection title at the top of every right hand pane.  The only potential downside to this approach is that it leads to variable length ‘child’ description pages, but we felt it was a reasonable trade-off because it enables the researcher to get a sense of the structure of the collection. Usually it means that they can see everything within one series on one page, as this is the most typical first child level of an archival description.  In EAD representation, this is everything contained within the <c01> tag or top level <c> tag.

Next Steps

We are currently testing the micro sites with early adopters: Glasgow University Archive Services, Salford University Archives, Brighton Design Archives and the University of Manchester John Rylands Library.

We aim to go live during September 2014 (although it has been hard to fix a live date, as with a new and innovative service such as the micro sites unforeseen problems tend to emerge with alarming regularity). We will see what sort of feedback we get, and it is likely that we will find a few things need addressing as a result of putting the micro sites out into the big wide world. We intend to arrange a meeting for the early adopters to come together again and feed back to us, so that we can consider whether we need a ‘phase 2’ to iron out any problems and make any enhancements. We may at that stage invite other interested institutions, to explain the process and look at setting up further sites. But certainly our aim is to roll out the micro sites to other Archives Hub institutions.