May 9, 2024:  Showcase – Computational Archival Storytelling with Jupyter Notebooks

On May 9, 2024, a public event showcased original and innovative work conducted by MLIS graduate students in the INST742 class (“Implementation of Digital Curation”) at the University of Maryland iSchool. The class (designed to provide hands-on learning experiences to students, with real-world environments and examples that touch on significant areas of digital curation) concluded with a 2-week final project.  Computational processing topics explored in INST742 included:

  • Archival Science concepts and workflows and Computational Thinking (CT)
  • Digitization management (ABBYYFineReader)
  • Cleaning & Transforming (OpenRefine)
  • Clustering algorithms (Artificial Intelligence)
  • Generative AI (NotebookLM)
  • Text Processing through NLP and NER (GATE: General Architecture for Text Engineering)
  • Geospatial Transformations through: geocoding, geolocating, georeferencing, and vectorizing/tracing (QGIS, ArcGIS)
  • Data visualization (Tableau Storyline and Tableau Dashboard)
  • Network analysis through graph databases (Neo4j)
  • Digital Curation at scale
  • Virtual machines (Azure Labs, Sandbox tools, Jupyter Notebooks)

This year again, the 15-week class was articulated around a single collection, consisting of a sample of the 1911 Charlotte NC city directory. Students had the choice to continue with this dataset or choose one of their own.

They were joined by 7 members of the IMLS-funded TALENT Network project  (educators, archivists, and technologists) and AIC Research Network.

  1. Mark Conrad: Digital archivist at the National Archives (former), Advanced Information Collaboratory (AIC)
  2. Greg Jansen: Senior Research Software Architect, U. Maryland iSchool
  3. Jennifer Proctor: Doctoral student, Advanced Information Collaboratory (AIC), and Faculty Research Specialist, ARLIS.
  4. Karen Gracy: Professor of Archival Studies, Kent State University.
  5. Bill Underwood: Research Scientist, Advanced Information Collaboratory (AIC).
  6. Lori Perine: Doctoral student, Advanced Information Collaboratory (AIC).

Mark Conrad shared that:
“After 14 weeks of classes, students with little to no prior hands-on experience with digital curation tools, developed thoughtful, well-executed projects over the course of only two weeks. They then presented their projects in a very professional manner to an audience of digital curation researchers and educators. Congratulations to all the students and their instructor!”

Showcase talks and topics included:

  1. Rachel HICKS / Shelly JUSTEMENT / Julia WEBSTER: Visualizing B&O Railroad’s Relief Department Records
  2. Nora DUNNE: Mapping City Shifts Over Time
  3. Frank FIORE: Laund, Laundress, & Laundryman: Examining Race and Gender Demographics in Laundry Service Jobs in Charlotte, NC 1911
  4. Katie RUFFING: Beyond the Census: Identifying and Mapping Women’s Work in 1911 Charlotte
  5. Wren LUGO: Madams and Boarding: A continued examination of ‘Madams’ and Boarding Houses as overlooked careers in the 1911 Charlotte Directory
  6. Dorothy TANG: Mapping the Lives of Library Workers of Carnegie Library in 1911
  7. Anna SZAPIRO: Am I hallucinating, or Is Someone in My room? Using ChatGPT to Analyze Multi-Family Dwellings in 1911 Charlotte, NC
  8. Scott DRANGINIS: Race and Railroad Careers in the early 20th century U.S. South
  9. Jessica PERKINS: Exploring Charlotte’s Boarding Homes
  10. Leah SIMS: Gender and Race Disparities in Business Ownership: What the telephone can tell us about business ownership in Charlotte, 1911
  11. Erin RAMOS: Silences: Unmarried Women and Black Families in 1911 Charlotte
  12. Lindsay OLIVER: Investigating Redlining Trends in Early 20th Century Charlotte
1. Visualizing B&O Railroad’s Relief Department Records

  • Authors: Rachel HICKS / Shelly JUSTEMENT / Julia WEBSTER
  • Abstract: We would like to work with an index of Relief Department records from the B&O Railroad Museum’s archives, which describes the records of one of the earliest forms of employer-based health insurance. B&O Railroad employees would apply to participate in the Relief Department and, based on a medical exam, would contribute a certain amount of money each month to the department. If the individual was injured on the job or fell ill, which happened fairly frequently, they could submit a claim and, if approved, receive insurance benefits. The indexed records originate from the early twentieth century but can contain files dating into the mid/late twentieth century. The physical paper records contain much medical information like doctor’s notes and medical assessments but, for privacy reasons, the index excludes any medical information. The dataset includes almost 12,000 rows which contain the individual’s name, birth date, race/ethnicity/gender (sometimes), job title/location, home location, and more. Our motivation for this project comes from Rachel’s work at the B&O Railroad Museum where she’s participating in the creation of the index of Relief Department records.Research questions include:
    • What types of individuals worked for the B&O Railroad in the twentieth century?
    • What jobs did workers hold, where did they live, and how much did they contribute to the Relief Department?
  • Dataset: Full datified Directory (16,000 entries), Sanborn map
  • Tools:  OpenRefine, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/3vZsRQNKJls (13′ 56″)
2. Mapping City Shifts Over Time

  • Author: Nora DUNNE
  • Abstract: This project aims to understand changing urban areas by looking at the street index of the Charlotte 1911 phonebook and comparing and contrasting it to a modern map of Charlotte. Investigations will include looking at clusters of wealth, business districts, racial segregation, and urban development and renewal.
  • Dataset: Full datified Directory (16,000 entries), Sanborn map
  • Tools: ChatGPT, QGIS, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/OO4FiC55SxU (9′ 50″)
3. Laund, Laundress, & Laundryman: The Demographics of Laundry Service Jobs and Businesses in Charlotte, NC 1911

  • Author: Frank FIORE
  • Abstract: This project examines data pertaining to the 131 laundry service jobs found in the Charlotte, North Carolina City Directory of 1911. It considers the job title abbreviation “laund,” as well as “laundress” and “laundryman,” which are analyzed alongside individual demographic data for race and gender. The results are then compared with similar demographic data from the full dataset, revealing laundry service jobs as the most common job held by Black women.
  • Dataset: General directory (16,000 entries)
  • Tools: OpenRefine, ChatGPT, and Jupyter Notebook
  • Video: https://youtu.be/PAnFhLrbRDY (15′ 37″)
4. Beyond the Census: Identifying and Mapping Women’s Work in 1911 Charlotte

  • Author: Katie RUFFING
  • Abstract: Through my project, I intend to identify and uncover the roles that women held in the workforce in 1911 Charlotte. I will start by utilizing ABBYYFineReader to identify all women in the dataset and create a separate gender column. I will do this by not only utilizing the “Married” column but also by identifying common names for women at the time. From there, I will use Tableau to identify and conduct exploratory data analysis on the various occupations held by women. I will then utilize QGIS to map the commute of some of these women, shedding light on their daily journeys to and from work. Through this, I hope to identify and communicate a nuanced understanding of women’s labor and mobility in 1911 Charlotte.
  • Dataset: Extracted Charlotte 1911 dataset, 1911 Charlotte Business Directory, 1911 Charlotte General Directory, 1911 Charlotte Street Directory, 1911 Charlotte Sanborn Map, Google Map Charlotte.
  • Tools: OpenRefine, Tableau, Neo4j, QGIS, ChatGPT, and Jupyter Notebook
  • Video:  https://youtu.be/xPlEEvaCmxQ (19′ 09″)
5. Madams and Boarding: A continued examination of ‘Madams’ and Boarding Houses as overlooked careers in the 1911 Charlotte Directory

  • Author: Wren LUGO
  • Abstract: A block study centered on ‘Spring’ street, using the business, street, and general directories to provide a more focused demographic or socio-economic picture and/or pictures. The project will possibly elicit more about the ‘madam’ profession as multiple individuals residing around this block held that profession. Geolocating or GIS mapping might be possible via QGIS considering that this project will use a portion of the map centered around ‘Davison’ and ‘First’ streets. Tableau will help visualize graphs that highlight the data and cross-comparisons.
  • Dataset: The Charlotte 1911 business, street, and general directories. The extracted Excel Directory. The historic Charlotte Sanborn map data.
  • Tools: QGIS, Tableau. OpenRefine, Excel, and Jupyter Notebook
  • Video: https://youtu.be/PnoT1V_V4ss (21′ 01″)
6. Mapping the Lives of Library Workers of Carnegie Library in 1911

  • Author: Dorothy TANG
  • Abstract: The goal is to map the locations of library workers of Carnegie Library to see where they lived in relation to each other and the library. Using a Jupyter notebook, I will provide some historical context about Carnegie Library, particularly related to the “colored branch” of the library..
  • Dataset: Charlotte 1911 business directory and general directory
  • Tools: Excel, QGIS, and Jupyter Notebook
  • Video: https://youtu.be/o942B6ZUuUg (11′ 12″)
7. Am I hallucinating, or Is Someone in My room? Using ChatGPT to Analyze Multi-Family Dwellings in 1911 Charlotte, NC

  • Author: Anna SZAPIRO
  • Abstract: This experimental project tests the potential of GenAI and, specifically, ChatGPT-4 to clean data stored in .csv format and identify and analyze relationships within large datasets using a sample from the 1911 City Directory for Charlotte, North Carolina. Taking individuals living in shared housing in Charlotte as a jumping-off point, I explore ways researchers might engage ChatGPT to discover and communicate demographic trends and reconcile problems in unwieldy datasets.
  • Dataset: Charlotte 1911 Directory
  • Tools: ChatGPT, Microsoft Excel, and Jupyter Notebook
  • Video: https://youtu.be/ThDb-Vvylqo (17′ 39″)
8. Race and Railroad Careers in the Early 20th Century U.S. South

  • Author: Scott DRANGINIS
  • Abstract: This project intends to analyze the job position disparity between black and white railroad workers for Southern Railway Co. in Charlotte, NC, during 1911. OpenRefine is used to sort the datafied directory down to just those railroad workers. From there, I will plug the data into Tableau and create a story that visualizes what types of jobs black and white workers shared and which ones were only held by one race or the other. My motivation for this is a mix of my personal interest in railways and railroads, and the hope that this sort of study could be valuable for others hoping to study race and how it impacted hiring in one of the most impactful industries of early 20th century America.
  • Dataset: The Charlotte 1911 Business directory and general directory, the full datafied excel directory, the index of other directories (purely for checking potential acronyms)
  • Tools: OpenRefine, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/unNTJtaHTAQ (14′ 47″)
9. Exploring Charlotte’s Boarding Homes

  • Author: Jessica PERKINS
  • Abstract: This project will focus on boarding homes within 1911 Charlotte, NC. Extracting data from the 1911 Charlotte directory, I plan to explore the demographics of those who both owned boarding homes and those who lived in them. I plan to use QGIS to map where certain boarding homes were located, Tableau to explore details of boarders, and Neo4j to explore any connections between related individuals. The goal of this project is examine these connections between individuals and get a better understanding of boarding homes through multiple lenses of geography, race, and gender among others.
  • Dataset: Charlotte 1911 City Directory
  • Tools: ABBYY, OpenRefine, QGIS or Neo4j, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/SSFpogOrUjA (21′ 22″)
10. Telephone Numbers in Relation to Business, Charlotte 1911

  • Author: Leah SIMS
  • Abstract: The intention behind this project is to visually analyze the gap of telephone ownership in 1911 Charlotte businesses. Gender and Race were major considerations in 1911, meaning the majority of businesses were owned by white men. I wish to not focus on who had a telephone but who did not. I will show this relationship by comparing the visuals for overall business ownership with that of telephone ownership.
  • Dataset: Charlotte 1911 business directory
  • Tools: OpenRefine, Neo4j, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/39qySUb_Yd4 (16′ 33″)
11. Silences: Unmarried Women and Black Families in 1911 Charlotte

  • Author: Erin RAMOS
  • Abstract: This project seeks to uncover the silences of Black families or family relations in the 1911 Charlotte directory. While White women are mentioned in connection to men (i.e. martial status) that same status is not afforded to Black women which leads to a silence of relations and families. Their relations one might deduce, is seen as unimportant, or unneeded information, due to the silence on the matter. I seek to find where these families lived and uncover a richer understanding of their lives and to bring back connections, and giving the families visibility to those who may be looking for them today.
  • Dataset: Charlotte Street and General Directory, 1911/10 Census
  • Tools: OpenRefine, QGIS, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/3vZsRQNKJls (11′ 18″)
12. Investigating Redlining Trends in Early 20th Century Charlotte

  • Author: Lindsay OLIVER
  • Abstract: This project will employ data from the Charlotte City Directory, U.S. Census Records, and Records of the Federal Home Loan Bank Board to visualize trends in redlining in early 20th century Charlotte.
  • Dataset: Charlotte City Directory, U.S. Federal Census Records, Home Owners’ Loan Corporation (HOLC)
  • Tools: ABBYY, OpenRefine, QGIS or Neo4j, Tableau, and Jupyter Notebook
  • Video: To be posted.

-Authored by Richard Marciano