This post is a copy of my final project for DIG 550 Digital Preservation, for the University of Maine’s online graduate Digital Curation Certificate. The ideas and opinions expressed here are my own and do not represent any official adoption or current project of Vermont Law and Graduate School.
At Vermont Law and Graduate School (VLGS), the library hosts two archival collections: our institutional archive, with materials relating to the history of the school, and a faculty scholarship collection, consisting of books, articles and other scholarly publications. We would like to digitize these materials for preservation and access, and I wanted to evaluate one potential platform to use in this work.
I chose DSpace™because I wanted to try an open-source solution for our digital repository and digital archive preservation needs. DSpace can ingest many different file types, including moving-image, audio, still image, text, and other formats. This makes it a potentially powerful and flexible option for a multi-format archive. The development group behind DSpace, Lyrasis, has developed other platforms, including ArchivesSpace and Fedora. There is no publicly-hosted demo of these, so I was not able to test them.
Testing DSpace
Installation of DSpace is complex and requires sysadmin level access to a server as well as knowledge of Linux commands. Even though the interface is web-based, there is no “one-click install” or drag-and-drop setup process for this platform, and the instructions were beyond my comfort level given the time allotted. I checked with our IT department about their availability to help, but they were not prepared to set up a test server on short notice.
Fortunately, I was able to use the DSpace public demo site, which is fully enabled with all features and functions. Unfortunately, because it’s an open and public demo, I couldn’t be sure that my test data would stay persistent over days. Items and collections I built were erased by other testers (or possibly Lyrasis staff), so I had to conduct tests piecemeal. At one point I realized I couldn’t create new Communities or Collections, only upload Items to previously-established ones.
On the positive side, it was easy to intuit how to use the system and where to put files based on the options available. The structure of DSpace is hierarchical:
- Admin is for overall access control and maintenance of the system (institution level);
- Community and Collection tiers serve an organizational functions;
- Publication level is for specific publications within a collection of similar content;
- Item level is where files are uploaded, described, and stored.
While it was easy to understand the basic architecture of DSpace as a platform, I realized that the system’s flexibility also means that users have to define an organizational structure for their content and carefully plan out how to implement it. Specifically, Item Type, Subject, and Subject Category could be used for various item attributes, and could vary at the Collection or Community level. Defining these designations ahead of time will keep the archive organized and make sure that it is searchable as it grows. For these fields as well as other more standard ones (Author, Date, etc), I would use a standardized metadata format such as MODS or Dublin Core and controlled vocabulary to keep the data uniform for better parsing, searching, and interoperability.
Other helpful features of the system include the ability to import metadata from an external source. As an example, if we had a spreadsheet with title, author, date, and subject information, that would not have to be re-created. DSpace also supports discovery via the OAI-PMH interface and Google Scholar optimizations.
In addition, the system automatically generates permanent URLs at the Community, Collections, Publication, and Item levels, and one can also attach unique permanent identifiers such as Digital Object Identifiers (DOI).
Limits of DSpace
At the platform level, the DSpace software is free and open-source, so there is a community of users and volunteer developers around it. There is additional support from Lyrasis itself, but not as much official help desk-type support as a commercial product would likely have. For heavily-customized instances of DSpace, it will be especially important for institutions to document their data architecture and decision-making reasons at the outset, and – as a particular project or collection is developed – to keep the data consistent.
DSpace is “owned” or sponsored by Lyrasis, a non-profit company that functions similar to a consortium, with paid membership. While they have been around in various forms for about 80 years, shifts in their priorities or mission could affect specific products like DSpace.
As with other open-source software, DSpace can be self-hosted or one can pay Lyrasis or other approved vendors to host for them.
Self-hosting requires more labor and understanding of the underlying server, plus processes and technology for backup, file checking and verification, and migration or updates. In exchange, a self-hosted instance may allow for greater flexibility and customization. Outside hosting offers the set-and-forget ease of use of offloading management tasks like security, backups, and updates, but requires funds and dependency on an outside group to keep the system running as desired.
The built-in fields mentioned above are not customizable. In other words, the term Community is fixed; an institution using DSpace has to define a Community and cannot call it something else (such as Department, Audience, or Research Area). This may cause confusion with some applications of DSpace for groups that are building something other than an academic institutional repository.
I also noticed that DSpace can react very slowly. Upload times, searching, and rendering documents sometimes had noticeable lag. This may be a function of the test environment itself or it may be due to the way the platform is built.
Recommended Uses
DSpace is built as an institutional repository, and I would use it that way for our collection of scholarly publications from our faculty. While ArchiveSpace or Fedora might better suit archival purposes, DSpace appears to be flexible enough to work as an archive as well, and it may be easier to have a single system rather than separate ones for archives and scholarship.
Currently, our physical archive and scholarly repository are closed collections, to protect the items, many of which are unique.Members of the VLGS community may request to view specific items by appointment, but the archive is not open for browsing. While it may seem natural to allow anyone to access digital versions of the archive after documents are scanned, we would probably maintain a closed or semi-restricted model for the digital version as well, with input from our internal Communications and Alumni Relations departments. The archives do contain sensitive information, including photographs of alumni and famous persons, copies of speeches given, as well as internal documents such as minutes from faculty meetings. The institution would like to maintain control over this material and present it in intentionally-curated ways for specific purposes, such as fundraising or alumni relations activities. DSpace provides access controls at the Community, Collections, and Item levels, which will be helpful in this regard.
As mentioned, DSpace is a free, open-source tool with an active community of support. Having no budget for a paid service, this tool would allow our institution to get started with an archive or institutional repository without much overhead. We already have server and storage space available that could be partitioned to set up an instance.
In my initial tests of DSpace I developed an outline of how we might use it for our archive materials. Communities define a set of users or audience: who is the digital archive for? A single institution can have multiple communities to serve different audiences.
At VLGS, I see a need for two communities, Archives and Scholarship:
- Archives would focus on the history of the institution (yearbooks, student newspapers, faculty meeting minutes, awards granted to students and faculty, graduation details, degree programs offered, special events and speakers). This set of collections would serve the Alumni, Advancement, and Communications departments, providing access to information that can be used to promote the history and reputation of the institution and attracting new students, faculty, and institutional partners
- Scholarship would focus on faculty publications: books, journal articles, and other research, and the audiences here would be faculty within and outside the institution as well as students and other researchers interested in legal scholarship.
Breaking down an institution’s overall digital collections into audiences this way helps distinguish how collections are defined and how they relate to each other and to the intended audience. This is a useful intermediate organizational structure between institution and collection.
At the collection level, we find groups of similar items. Our collection of digitized print yearbooks from 1976 – 1991 would be an example of a collection; the digitized version of the alumni magazine would be another, the student newspaper would be a third, and so on. Within a collection you have individual items. One yearbook or one issue of the alumni magazine would be considered an item. In DSpace, each item can be made of multiple files or parts; in these examples, a page would be one file in a collection, and all the files for a single issue would make up the item, along with its metadata.
One important distinction to make is that DSpace is not a particularly attractive interface from a public user perspective. However, our archive and faculty scholarship materials are not likely to be made public en masse. We would use DSpace as a back end for organizing and preserving digital files and metadata, and then selectively feed items to another platform (WordPress or Omeka-S, for example) to create digital exhibits or collections for alumni or scholars to browse.
In conclusion, while it would take careful planning and coordination with our IT department to set up, I think DSpace is a flexible and usable tool for preserving faculty scholarship in digital format (whether born-digital or scans from printed materials). I also think it could work in our case as a digital archives platform, and it would be interesting to link the institutional archive with the scholarly one, so that, for example, one could see a particular professor’s activities at the school alongside their published scholarship.