On August 11, 2021 members of the University of Maryland Digital Curation for Information Professionals (DCIP) Certificate Program Cohort for 2021 showcased their capstone projects. The DCIP Certificate Program consists of three classes – Introduction to Digital Curation (6 weeks), Tools and Software for Digital Curation (12 weeks) and Implementing Digital Curation in the Workplace (12 weeks). Members of the cohort who successfully complete all three courses are awarded a certificate.

For the third course, cohort members must complete a capstone project in which they use some of the skills and knowledge that they have acquired in the first two courses to implement a digital curation project. The capstone project can take place at either their workplace or as part of another organization or project that interests them.

Robert McD. Parker led off with his presentation, Old Data, New Data: New Perspectives Enriching Historical Data. He discussed his role in a project that had been going on for fourteen years to digitize and make searchable data concerning urban renewal in the Southside neighborhood of Asheville, NC. As a late-comer to the project, Robert was tasked with doing contextual research and testing the interface to the data. He also gave a presentation at the launch of the website and database. You can find the searchable database here by clicking on Remapping at the top of the screen. You can find information about the launch of the website – and Robert’s presentation at the launch – here.

For his capstone project Sebastian John decided to build on a long term interest in farmers markets by analyzing and visualizing USDA data on farmers markets in the US. He recorded his project as a blog which you can find here. For his presentation he talked about his efforts to find appropriate datasets, transform and clean the data, and present the data in the form of spreadsheets and visualizations.

Next Rachel McNellis gave her presentation entitled, From Obsolete to Accessible: Adding the LMLO to the Cantus Database. For this project Rachel took data originally collected by Andrew Hughes – the Late Medieval Liturgical Offices (LMLO). She found a version of the data that had been converted to a csv file from its original, published form as a FileMaker database. Working with the csv file she was able to substantially clean and transform the data into a form that could be published in Cantus. Cantus is an online database of Latin Ecclesiastical Chant from manuscript sources. It contains over a half million entries.

Kathryn Burke then gave her presentation, Re-Representing the Historical St. Anne’s Cemetery in Annapolis, MD. Katie’s project involved taking hard copy data concerning people buried in the cemetery, OCRing it, combining it with other data, and using current and historical maps to relate the data to individual headstones and their location in the cemetery. This is an ongoing project because of the large volume of data to be wrangled, but Katie is hooked and will continue to work on it for the foreseeable future.

Next up was Margaret Doyle talking about her work on the Morgenthau Holocaust Collections Project. Henry Morgenthau, Jr. was President Franklin D. Roosevelt’s Secretary of the Treasury, friend, and head of the War Refugee Board. One rich source of Holocaust information is the Morgenthau Diaries. This project is applying machine learning (ML) algorithms to the digitized content of 864 volumes of diaries to provide better access to the subjects covered in the diaries. For part of her capstone project Margaret used the HMJr Verifier app to validate and clean up the results of the ML work. The project team is also preparing Jupyter Notebooks about the Morgethau diaries. As another part of her capstone project Margaret is reviewing these notebooks and making suggestions on how to make them more understandable to a non-computer science audience.

Nickoal Eichmann-Kalwara then gave a presentation on a project she is working on, Digital El Diario. She is working with the recently digitized version of El Diario de la Gente, 1972-1983, an independent newspaper for the Chicanx community at the University of Colorado, Boulder. She and her students have created 900+ plain-text versions of articles, poetry, artworks, cartoons, and advertisements from the OCR’d text and are examining the data from different perspectives.

Finally, Emily Cavanaugh discussed her project working with the manifest for the RMS Titanic. She began by downloading data from the Encyclopedia Titanica. She then cleaned the data, transformed some of the data, and created various visualizations of the data using Tableau.

The Cohort did a great job of demonstrating what they learned in the DCIP courses and beyond. They creatively applied digital curation tools and methods to a wide variety of data across a diverse set of subject matter.

Authored by Mark Conrad