The Archives Hub transition
By Steve Tattersall
The process of moving from our existing software to Cheshire 3 presented a number of significant challenges over the past few months. From a personal perspective this has been the most intense period during my years of service to the Hub team, however it has also been the most rewarding time too.
Changes
The Hub has been using Cheshire 2 on a Linux server with a website that has been tweaked and modified over the past few years, but fundamentally remained the same. Over the last few years the Hub has always tried to move forward and improve standards whilst maintaining a fundamentally key service used by the archive/research community.
However, having worked on the Hub in a technical role for some time, we have seen the limitations of Cheshire 2 and a felt the pressing need to move forward onto Cheshire 3, to embrace new technologies, develop our identity as a service and also improve on the already successful service we provide.
As a result, the Hub has been given a complete overhaul, with the Hub service installed on new cutting edge hardware with a new system architecture, new web site and new application software. The Hub now uses a Sun server, Solaris Zones running Solaris 10, Cheshire 3 Hub application software and a rebranded website with a more modern look and improved navigation.
It would be difficult to describe in detail all the differences with the Hub software, but one of biggest changes would be the Cheshire 3 code, which is vastly different to Cheshire2. Cheshire 3 is now faster and uses z39.50, SRW, OAI-PMH and XML and is the next generation of Cheshire designed around a distributable, object-orientated model.
Changes to the Cheshire Hub application software has seen us move from an impenetrable 5000 line CGI web search script used with Cheshire 2 to a more modular coding approach which simplifies the process of diagnosing fixing errors and will help develop new features on the Hub in the future.
In addition, the use of XSLT and Python now helps us have more control over the way our descriptions and search results are displayed, which is a big improvement from the previous Hub software.
Challenges
During the Hub migration process we faced delays to the delivery and installation of the hardware, and also the loss of a key member of systems administration staff. These were obstacles that needed to be overcome but weekly hardware refresh meetings helped keep everyone focused and aiming in the right direction.
I think it is probably obvious, but any major change to software brings its own set of challenges. In respect to hardware and the new architecture, I have personally had to learn and understand a number of new things in a small space of time. Getting to grips with virtualization and the use of Solaris zones was challenging, especially having worked on a Linux environment for some time. This demanded a new approach in installation and in providing maintenance for the new Hub software. New snapshot backups, and cluster failover configuration etc., are just some of the new procedures to get familiar with on the new hardware.
The Hub software installation procedure was quite a time consuming process, as the automated method to install the Cheshire base and Cheshire Hub software was designed to work on Linux Solaris. Therefore it was necessary to build all Hub software libraries manually and also diagnose/fix and document any specific Cheshire / Solaris 10 related issues along the way.
Once the installation had been completed, and after lots of testing it became clear that the processors on the new machine were not fast enough during Hub searches, to provide expected response times. The solution we opted for, was to setup the Hub service on a Sun M4000 server, which had fewer but more powerful processors. We saw some improvement to search speed, but not sufficiently. Therefore I changed the Hub configuration so that our vspoke databases were split between different ports, and this then helped improve Hub performance during searching.
Lessons Learnt
Hindsight is a beautiful thing and if I was to offer advice to a colleague carrying out a similar project or if I had to repeat the process again, I would definitely try and avoid having to make so many core changes to a service all at the same time. However, sometimes the timing and situation you are faced with, dictates the order in which things have to be approached.
The major issues that we faced were with regard to Hub performance and search speeds.
To resolve this issue required implementing a combined solution of transferring the Hub to a different server and also making additional software configuration changes. This certainly required patience and careful consideration, in order to come up with the best solution.
This certainly required patience and careful consideration, in order to come up with the best solution.
When working with old, familiar software it is easy to just follow the same familiar approaches and way of doing things. However, the major changes to the Hub software have required me to adopt new methods and learn new technical skills, which I think has helped to motivate me and given me inspiration in this area of work.
During the process of going Live with the new and improved Hub, we moved to a different operating system / hardware architecture, new application software (Cheshire3) and a new web site design, all of which individually would prove a challenge for any service to deal with. So despite the challenges I have already mentioned, having accomplished all of these core changes at the same time has given me a sense of achievement which is very satisfying.
The whole process has given me a new enthusiasm for my work and for the future of the Hub and how we can expand and improve on what is a very exciting and successful service.