May 15, 2026:  Showcase – Computational Archival Storytelling with Jupyter Notebooks

On May 15, 2026, MLIS graduate students in the INST742 class (“Implementation of Digital Curation”) at the University of Maryland iSchool showcased original and innovative work. The class (designed to provide hands-on learning experiences to students, with real-world environments and examples that touch on significant areas of digital curation) concluded with a 2-week final project.

This course is part of a new series of 5-courses in Computational Archival Science (CAS) (see: https://ai-collaboratory.net/cas/). These courses are specifically designed for MLIS students interested in developing skills in digital curation and computational thinking but are also suited to INFO master’s students in other programs and doctoral students, as well as graduate students from other colleges. They currently include:

  1. Coding for Non-Coders: LBSC708F: Introduction to Computational Archival Science (CAS) & Python
  2. GenAI & LLMs: INST728L: GenAI & Large Language Models (LLMs) for Library and Archive Collections
  3. Graphs: INST608D: Using Network Visualization to Explore Library & Archive Collections
  4. Maps: INST608C: Spatial Representation & Analysis for Library & Archive Collections
  5. Digital Curation/Data Science: INST742: Implementing Digital Curation

Computational processing topics explored in INST742 included:

Students applied the following computational processing topics throughout the semester:

  • Archival Science concepts and workflows and Computational Thinking (CT)
  • Digitization management (ABBYYFineReader)
  • Cleaning & Transforming (OpenRefine)
  • Clustering algorithms (Artificial Intelligence)
  • Generative AI (NotebookLM)
  • Geospatial Transformations through: geocoding, geolocating, georeferencing, and vectorizing/tracing (QGIS, ArcGIS)
  • Data visualization (Tableau Storyl and Tableau Dashboard)
  • Network analysis through graph databases (Neo4j)
  • Digital Curation at scale
  • Virtual machines (Azure Labs, Sandbox tools, Jupyter Notebooks)

This year again, the 15-week class was articulated around a single collection, consisting of a sample of the 1911 Charlotte NC city directory. Students had the choice to continue with this dataset or choose one of their own.

Samples of their experiments include:

 

Professor Jane Greenberg (Director of the Metadata Research Center at Drexel U. and AI-Collaboratory Co-Founder and CAS Leader) stated: “This work beautifully demonstrates the growing significance of Computational Archival Science (CAS) in supporting research and knowledge discovery. It also highlights the important role CAS plays in bridging archives, AI, and emerging computational approaches. It is inspiring to see students develop and apply critical skills in spatial representation, data analysis, and archival research, demonstrating how computational methods can open new pathways for analysis, interpretation, and knowledge creation. Congratulations to all the students and their instructor for advancing such innovative and important work!”

Nine final student projects were showcased:

  1. Allison SCHOENAUER: Networks of Alumni: Visualizing Employment Trends of Early Farm School Alumni
  2. Jeni CROCKETT-HOLME: Snapshot of Tryon Street in 1911 Charlotte
  3. Eleanor VANDER LAAN: Expanding the Charlotte 1911 Directory Through Reparative Metadata Creation
  4. Caleb HURLEY: Tour De Caleb: Mapping and Visualizing a Cycling Training Season
  5. Emma LYONS: Popular Professions in 1911 Charlotte
  6. Amy DANIEL: Creating and Cleaning Datasets Using GenAI
  7. Connor BUCKLEY, Mati KASSAYE, Ben POLLOCK: Analyzing Language Disparities: Exploring the 1926 Irish Census for the Boola District Electoral Division (DED)
  8. Jada YOLICH: Economic Dimensions of African American Neighborhoods 1911 Charlotte
  9. Gregory SZWARCMAN: Analyzing Migration and Population Movement Records: a Personal Exploration
1. Networks of Alumni: Visualizing Employment Trends of Early Farm School Alumni

  • Author: Allie SCHOENAUER
  • Abstract: Newspapers have been a focus for digital preservation because of the fragility of the material and the amount of information they contain. Recently, the McDonogh School has undertaken a project to digitize its long-running student newspaper, The Week, and make it available online through Digital Maryland, a public history project run out of the Enoch Pratt Library. This newspaper regularly reported on recent developments in the lives and careers of alumni, some of whom had taken up positions in and around Maryland, and some of whom had moved out of state or out of the country.
    This project aims to use software like ABBYY Fine Reader and OpenRefine to extract and manipulate information related to alumni published by The Week. Specifically, this project is looking at career and location information shared with The Week publishers in order to understand the reach McDonogh School alumni had in the early years of the school. This project also hopes to illuminate relationships that may be obscured by the unstructured nature of the information presented in The Week and visualize them using Tableau.
  • Dataset: McDonogh School Registry, The Week Vol. 1, Weeks Baltimore Directory 1883
  • Tools: ABBYFineReader, Neo4j, Tableau, and Jupyter Notebook
  • Video: https://youtu.be/HqlVCC4iry4 (25′ 34″)
2. Snapshot of Tryon Street in 1911 Charlotte

3. Expanding the Charlotte 1911 Directory Through Reparative Metadata Creation

  

  • Author: Eleanor VANDER LAAN
  • Abstract: One of the notable features of the Charlotte 1911 Directory is that most married women are listed in the same line as their husbands. When datafying the directory using an entry-based approach, a married woman gets buried in the same entry as her husband. Having a husband and wife share the same entry not only takes away the wife’s importance as an adult in the household, but is also skews much of the demographic data we are able to extract from the Directory.
    My project to create entries for the wives is inspired by the concept of reparative archival description. Many contemporary archives have been updating collection metadata to give personhood back to women who were only referred to by their husbands’ names. For example, “Mr. and Mrs. John Smith” would become “Mr. John Smith and Mrs. Jane Smith” or even “John Smith and Jane Smith.” This project applies the same concept to the dataset, hoping to expand out the entries to give wives their proper recognition.
  • Research questions: (1) How does adding directory entries for the hidden women change the Charlotte 1911 Directory data? (2) How does adding hidden women change the demographic ratios for gender, race, and marriage rates?
  • Datasets: Full 1911 Charlotte City Directory (16,000 entries), Social Security Administration Popular Baby Names Dataset
  • Tools: Excel, OpenRefine, Python, Tableau, Neo4j, and Jupyter Notebook
  • Video: https://youtu.be/qKcPWii76Xo (25′ 14″)
4. Tour De Caleb: Mapping and Visualizing a Cycling Training Season

    • Author: Caleb HURLEY
    • Abstract: This project explores how personal fitness and movement data can be transformed through data curation, visualization, and geospatial analysis. Using 8 selected cycling training sessions from seven months of cycling training data stored in TCX (Training Center XML) files, the project analyzes patterns in cycling activity over time, including route evolution, distance traveled, average speed, elevation gain, and power output.
      The project investigates how born-digital fitness data can be curated and interpreted as both quantitative and narrative evidence of behavioral patterns, physical progression, and geographic movement. Rather than focusing only on statistical summaries, the project emphasizes storytelling through visualizations and mapping. The project asks the following research question: How can personal cycling telemetry data be transformed into a computational narrative about training progression, spatial movement, and physical activity over time?
      The project involves extracting and cleaning TCX ride data, converting GPS coordinate information into usable geographic datasets, and creating visualizations that illustrate trends in training volume and performance. GPS route data from rides will be visualized in QGIS to create maps tracing ride locations over time. Tableau dashboards and graphs will be used to visualize changes in ride distance, average speed, elevation, and power metrics across the 8 selected rides from the seven months.
      The project narrative moves from geospatial data without power metrics, to power-based structured data without geospatial data, then training records with both data types combined. The notebook documents the workflow used to transform raw TCX telemetry data into curated datasets and visual representations. The project also aims to demonstrate how digital humanities and data curation methodologies can be applied to fitness datasets rather than traditional archival or historical records.
    • Dataset: Personal cycling ride data and GPS coordinates data exported to TCX files from cycling applications (Strava and Cadence) and converted to CSV files.
    • Tools: 
      • Python/Powershell/Notepad++/GPSBabel
        For converting TCX files into CSV files suitable for mapping and visualization
      • Excel
        For combining data from multiple CSV files into a single CSV dataset for normalizing and creating visualizations.
      • OpenRefine
        For cleaning, organizing, and transforming extracted cycling ride datasets into structured CSV files suitable for mapping and visualization.
      • QGIS
        For geospatial visualization of cycling routes using GPS coordinate data extracted from TCX ride files.
      • Tableau
        For creating dashboards and visualizations showing trends in cycling metrics such as distance, speed, elevation gain, and power output over time.
      • Jupyter Notebook
        For documenting workflow, presenting narrative analysis, and organizing the final computational storytelling project.
5. Popular Professions in 1911 Charlotte

  • Author: Emma LYONS
  • Abstract: Through using the 1911 Directory, I will clean up and sort data in OpenRefine to identify the most common jobs at the time. I will then analyze popular jobs based on race, marital status, or gender, if possible. Through this research, I will ask: Were the most popular professions segregated? Were particular industries segregated? Are any of these professions still prevalent today? Were these jobs specific to Charlotte? After I discern jobs of interest, I will represent the information visually to make it more digestible.
  • Dataset: Full 1911 Charlotte City Directory (16,000 entries)
  • Tools: Excel, OpenRefine, Tableau, Neo4j, and Jupyter Notebook
  • Video: https://youtu.be/pCyIbmreESY (20′ 25″)
6. Creating Archival Datasets for Beginners: Using both Traditional Methods and AI

  • Author: Amy Daniel
  • Abstract: Explores and contrasts the ways one can create a dataset from digitized documents using traditional methods (OCR) and AI.
  • Dataset: Selected pages from the Charlotte 1911 Directory
  • Tools: AbbyFineReader, OpenRefine, Google NotebookLM, Excel, and Jupyter Notebook
  • Video: https://youtu.be/0zXk2lYbYTo (22′ 12″)
7. Analyzing Language Disparities: Exploring the 1926 Irish Census for the Boola District Electoral Division (DED)

 

  • Authors: Connor BUCKLEY, Mati KASSAYE, Ben POLLOCK
  • Abstract: Just this year (2026), the 1926 Census of Ireland was made available to the public by the National Archives of Ireland. A searchable database leads users directly to scans of the Census documents filled out by each and every family in Ireland. These individual documents, and specifically those emerging from the town of Knocktoosh in the District Electoral Division (DED) of Boola in the county Limerick, contain what would ultimately become the dataset we used for this exploration.
    Our decision to explore this data was prompted by the familial ties of one of our group members (Connor). The image below depicts family members of Connor’s as depicted in one of the twenty-eight Census forms (containing a total of 131 individuals) which we would incorporate into our dataset.
    Objectives: This project aims to explore the key demographic markers of gender, occupation, and language proficiency (as represented in the Census forms) in Knocktoosh. By highlighting occupation, language, and gender, we intend to compare and contrast the residents of Knocktoosh as individuals while also creating an in-depth snapshot of the village as it was 100 years ago.
  • Dataset: 1926 Census of Ireland entries for the 131 residents of Knocktoosh, Co. Limerick. This dataset is available through the National Archives of Ireland’s website.
  • Tools: ABBYFineReader, Lido’s OCR, OpenRefine, Neo4j, QGIS, and Jupyter Notebook
  • Video: https://vimeo.com/1194430623/38fd9fe3ce (20′ 49″)
8. Economic Dimensions of African American Neighborhoods 1911 Charlotte

  • Author: Jada YOLICH
  • Abstract: This project aims to explore the fundamental questions related to Black neighborhoods’ economic viability. I chose to look at the rates of home ownership, employment rates and job types, and locations of business to see how many there were in African American neighborhoods, and then use that data to roughly gauge what economic opportunity existed in these locales. To accomplish this, I first used OpenRefine to clean up the dataset to best fit my purposes; Python to calculate latitudes and longitudes of residents’ addresses; QGIS to plot those geographic coordinates; Tableau to visualize the data and gain a clearer understanding of the narrative it tells, and Neo4j to answer some lingering questions.
  • Dataset: Partial 1911 Charlotte City Directory (4,594 entries)
  • Tools: OpenRefine, Neo4j QGIS, Tableau, Jupyter Notebook, and Python
  • Video: https://youtu.be/-4VWuKsT8pY (19′ 43″)
9. Analyzing Migration and Population Movement Records: a Personal Exploration

  • Author: Gregory SZWARCMAN
  • Abstract: Generate OCR for family records belonging to my great-grandparents, both of whom emigrated from Italy in the early 20th century. My great-grandfather, Luigi, immigrated to the U.S. in 1914, just months before WWI broke out. The oldest records I have are his Italian passport and military records, circa 1913. My goal is to extract metadata from the documents. Luigi died young in 1926, so it would be nice to be able to glean more info about his life from these documents.
  • Dataset: Ellis Island Passenger Lists
  • Tools: OpenRefine, Google NotebookLM, Jupyter Notebook
  • Video: https://youtu.be/Dwzt-Jeujoc (16′ 32″)

-Authored by Richard Marciano