May 5, 2022: Digital Curation Project Showcase Event
On May 5, 2022, a public event showcased original and innovative work conducted by MLIS graduate students in the INST742 class (“Implementation of Digital Curation”) at the University of Maryland iSchool. The class (designed to provide hands-on learning experiences to students, with real-world environments and examples that touch on significant areas of digital curation) concluded with a 2-week final project. Digital Curation Implementation topics explored in INST742 and applied to a small sample of the 1911 Charlotte NC city directory, included:
Archival Science concepts and workflows and Computational Thinking (CT)
Digitization management (ABBYYFineReader)
Cleaning & Transforming (OpenRefine)
Data Wrangling (Trifacta)
Clustering algorithms (Artificial Intelligence)
Text Processing through NLP and NER (GATE: General Architecture for Text Engineering)
Geospatial Transformations through: geocoding, geolocating, georeferencing, and vectorizing/tracing (QGIS, ArcGIS)
Data visualization (Tableau Storyline and Tableau Dashboard)
Students leveraged this earlier course work to explore a broader range of digital curation techniques and collections, leading to the design of ten original and outstanding projects:
1. Mieko PALAZZO & Gabrielle PUGLISI: “Visualizing Gender and Employment in the 1911 Charlotte City Directory using Graph Databases”
The focus was on scaling up the original dataset to 500 entries, designing and creating a social network of people, places, jobs, gender, marital status, companies, race, and housing, and demonstrating “needle-in-a-haystack” graph searches using the Cypher query language.
2. Monique BROOKS: “Recreating Community in African American Laborers in the 1911 Charlotte City Directory”
Using OpenRefine to subset 16,000 entries into 1,300 “African American Laborer” records to better understand place, community, and kinship in 1911 Charlotte, NC. The project proposes an original methodology of clustering individuals spatially by filtering on street housing locations, in order to recreate community.
3. Kiley MEAD: “Documenting the Printing Industry through the 1905 Baltimore City Directory”
The project uses the 1905 Baltimore City Directory to identify all printing related jobs in order to recreate the vibrant landscape of the newspaper industry, including: journalist, reporter, editor, photographer, news director, librarian, pressfeeder, pressman, presshand, paper carrier, advertising writer, newsdealer, bookbinder, printer, engraver, lithographer, newsdealer, news carrier, die setter, typemolder, typemaker, proofreader, coffee roaster, linotype operator, etc. The project then uses GoogleEarth to geolocate, map, and visualize: printers, reporters, typecasters, lithographers, librarians, pressfeeders, news carriers, bookbinders, editors, engravers, and linotype operators.
4. Krystyne DYADYURA: “Digital curation of Born-Digital Materials Based on Realtime Events Related to the Destruction of Buildings in Ukraine”
The project was developed to design an application to compile and map information on the Russian bombarding of cities in Ukraine. Information sources analyzed included twitter, telegram posts, and other social media. A database was assembled allowing for the creation of maps and interactive analysis through Tableau. The creation of a controlled vocabulary on events in Ukraine was explored (church, TV tower, government buildings, consulates, oil depots, etc.), with GoogleEarth visualizations that include parameters such as: number of buildings, building types, locations, frequency over time, etc.
5. Nikki PRATT: “Analyzing 19th Century Funerals through Church Registers”
The project analyzes funerals listed in the register of Calvary Episcopal Church (Dinwiddie Co., VA). Tools included OCR (ABBYFineReader), data cleaning (OpenRefine), visual analysis (Tableau Storyline). Research questions covered:
exploring infant and child mortality changes from the mid to late-19th century.
looking at the impact of the Civil War.
examining soldier-aged male deaths during the Civil War.
expanding the study to hot months in the summer, marital status, and gender.
6. Noah SCHEER: “Mapping the 1910 Census in Charlotte, NC”
The project focuses on a particular 1910 Charlotte NC Census Enumeration District. It datafies the content using FamilySearch.org, and geolocates over 1,000 people onto a modern OpenStreet map using QGIS, and a georeferenced 1910 Sanborn Fire Insurance Map. The integration of these datasets allows the categorizing of historic census data using gender, race, birthplace, and marital status, and the comparison of information content between the 1910 Census and the 1911 City Directory.
7. Kamilah ZISCHANG: “Understanding the Lives of African American Women in 1911 Charlotte, NC”
The project is concerned with identifying and following the lives of African American women over time and space in Charlotte, NC in 1911. The project selects a subset of women and traces them through Census records (1910), Sanborn Fire Insurance maps (1905, 1911, 1929), and City Directories (1911, 1918). Information integration, geocoding, geolocation through QGIS, and visualization through Tableau are demonstrated. This represents a proof-of-concept for elevating the lives of often underrepresented individuals through information integration and digital curation techniques.
8. Lacey HALL: “Digital Curation at Scale on the 1911 Charlotte City Direcotry” ** Annabel THOMPSON, Bo LENHARDT, Britton SCHAMS **
This project looks at digital curation processes at scale using the entire datafied 1911 Charlotte City Directory with close to 16,000 entries. OpenRefine is demonstrated at scale identifying jobs and companies, further refining the data and separating fields out into house locations, phone numbers, gender, etc. This is a demonstration of data cleaning at scale generalizing patterns through regular expressions, and testing the resiliency of models developed on smaller samples.
9. Colleen BRENNER: “1911 Charlotte City Directory At Scale: Refining and Geo-Coding” Using OpenRefine and the Texas A&M geocoder, showing how 12,000 addresses can be validated and mapped automatically.
10. Mary LARSON: “Mapping Segregation in the 1914 Charlottesville, VA City Directory”
Using OCR and spatial representation to reinterpret 1914 Charlottesville.