May 15, 2026:  Showcase – Computational Archival Storytelling with Jupyter Notebooks

On May 15, 2026, MLIS graduate students in the INST742 class (“Implementation of Digital Curation”) at the University of Maryland iSchool showcased original and innovative work. The class (designed to provide hands-on learning experiences to students, with real-world environments and examples that touch on significant areas of digital curation) concluded with a 2-week final project.

This course is part of a new series of 5-courses in Computational Archival Science (CAS) (see: https://ai-collaboratory.net/cas/). These courses are specifically designed for MLIS students interested in developing skills in digital curation and computational thinking but are also suited to INFO master’s students in other programs and doctoral students, as well as graduate students from other colleges. They currently include:

  1. Coding for Non-Coders: LBSC708F: Introduction to Computational Archival Science (CAS) & Python
  2. GenAI & LLMs: INST728L: GenAI & Large Language Models (LLMs) for Library and Archive Collections
  3. Graphs: INST608D: Using Network Visualization to Explore Library & Archive Collections
  4. Maps: INST608C: Spatial Representation & Analysis for Library & Archive Collections
  5. Digital Curation/Data Science: INST742: Implementing Digital Curation

Computational processing topics explored in INST742 included:

Students applied the following computational processing topics throughout the semester:

  • Archival Science concepts and workflows and Computational Thinking (CT)
  • Digitization management (ABBYYFineReader)
  • Cleaning & Transforming (OpenRefine)
  • Clustering algorithms (Artificial Intelligence)
  • Generative AI (NotebookLM)
  • Geospatial Transformations through: geocoding, geolocating, georeferencing, and vectorizing/tracing (QGIS, ArcGIS)
  • Data visualization (Tableau Storyl and Tableau Dashboard)
  • Network analysis through graph databases (Neo4j)
  • Digital Curation at scale
  • Virtual machines (Azure Labs, Sandbox tools, Jupyter Notebooks)

This year again, the 15-week class was articulated around a single collection, consisting of a sample of the 1911 Charlotte NC city directory. Students had the choice to continue with this dataset or choose one of their own.

Samples of their experiments include:

 

Professor Jane Greenberg (Director of the Metadata Research Center at Drexel U. and AI-Collaboratory Co-Founder and CAS Leader) stated: “This work beautifully demonstrates the growing significance of Computational Archival Science (CAS) in supporting research and knowledge discovery. It also highlights the important role CAS plays in bridging archives, AI, and emerging computational approaches. It is inspiring to see students develop and apply critical skills in spatial representation, data analysis, and archival research, demonstrating how computational methods can open new pathways for analysis, interpretation, and knowledge creation. Congratulations to all the students and their instructor for advancing such innovative and important work!”

Mark Conrad, Co-Founder of the AI-Collaboratory and CAS Initiative, and former digital archivist for the National Archives, shared that: “Dr. Marciano has demonstrated how to effectively bridge the gap between theory and practice. In preparing students to become information professionals, there is no substitute for hands-on experience with some of the tools they are likely to encounter. The students clearly stepped up to the challenge. Their projects show a clear understanding of the tools and how to creatively use them. Congratulations to all!”

Nine final student projects were showcased:

  1. Allison SCHOENAUER: Networks of Alumni: Visualizing Employment Trends of Early Farm School Alumni
  2. Jeni CROCKETT-HOLME: Snapshot of Tryon Street in 1911 Charlotte
  3. Eleanor VANDER LAAN: Expanding the Charlotte 1911 Directory Through Reparative Metadata Creation
  4. Caleb HURLEY: Tour De Caleb: Mapping and Visualizing a Cycling Training Season
  5. Emma LYONS: Popular Professions in 1911 Charlotte
  6. Amy DANIEL: Creating and Cleaning Datasets Using GenAI
  7. Connor BUCKLEY, Mati KASSAYE, Ben POLLOCK: Interacting with Irish-Speaking Residents of Knocktoosh, Ireland: An Exploration of the 1926 Census for the Boola District Electoral Division (DED)
  8. Jada YOLICH: Economic Dimensions of African American Neighborhoods 1911 Charlotte
  9. Gregory SZWARCMAN: Analyzing Migration and Population Movement Records: a Personal Exploration
1. Networks of Alumni: Visualizing Employment Trends of Early Farm School Alumni

  • Author: Allie SCHOENAUER
  • Abstract: This project intends to visualize employment trends of the first ten years of alumni from McDonogh School, a farm and military school located in what is now Owings Mills, Maryland.  This study will be looking at factors like location, commonly recurring businesses, industry, and graduation status.
  • Dataset: McDonogh School Registry, The Week Vol. 1, Weeks Baltimore Directory 1883
  • Tools: ABBYFineReader, Neo4j, Tableau, and Jupyter Notebook
  • Video: ′ TBPosted
2. Snapshot of Tryon Street in 1911 Charlotte

3. Expanding the Charlotte 1911 Directory Through Reparative Metadata Creation

  • Author: Eleanor VANDER LAAN
  • Abstract: The current format of the Charlotte 1911 Directory does not afford married women with living husbands their own line in the directory; instead, their personhood is buried in parentheses next to the name of their husband. Having them share the same entry not only takes away their importance as adults in the household, but it also makes completing data-analysis based upon single entries of the directory difficult.
  • My project is inspired by the concept of reparative archival description. It is very common for contemporary archives to update collection metadata that once only included the name of a husband to include both the husband and wife’s names. For example, instead of “Mr. and Mrs. John Smith,” metadata would be “Mr. John Smith and Mrs. Jane Smith,” or even John Smith and Jane Smith. This project takes these principles and applies it to the directory, subsequently changing the data as well.
  • Research Question: How does adding directory entries for the hidden wives change the Charlotte 1911 Directory dataset? Specifically, how does it change the demographic ratios regarding gender, race, marriage rates, and household size once wives are given their own entries?
  • Project Idea: Using Python, split out the wives’ names and give them their own entries in the dataset, copying over the rest of the line as well EXCEPT for the husband’s first name (ex: “Smith John (Jane) 111 S Tryon” -> “Smith John, 111 S Tryon” –line break– “Smith Jane, 111 S Tryon.”
    • Or I might think about how I can show their relationship as married to each other. Possibly doing “Smith John (Jane) 111 S Tryon” -> “Smith John (Jane) 111 S Tryon” –line break– “Smith Jane (John), 111 S Tryon” to indicate the first name of their spouse.

  • Dataset: Full 1911 Charlotte City Directory (16,000 entries), possibly Mecklenburg County 1910 Census
  • Tools: OpenRefine, Tableau, Neo4j, Python, and Jupyter Notebook
  • Video: TBPosted
4. Tour De Caleb: Mapping and Visualizing a Cycling Training Season

    • Author: Caleb HURLEY
    • Abstract: This project will explore how personal fitness and movement data can be transformed through data curation, visualization, and geospatial analysis. Using approximately seven months of cycling training data stored in TCX (Training Center XML) files, the project will analyze patterns in cycling activity over time, including route evolution, distance traveled, average speed, elevation gain, and power output.The intent of the project is to investigate how born-digital fitness data can be curated and interpreted as both quantitative and narrative evidence of behavioral patterns, physical progression, and geographic movement. Rather than focusing only on statistical summaries, the project will emphasize storytelling through visualizations and mapping. The project asks the following research question: How can personal cycling telemetry data be transformed into a computational narrative about training progression, spatial movement, and physical activity over time?
      The project will involve extracting and cleaning TCX ride data, converting GPS coordinate information into usable geographic datasets, and creating visualizations that illustrate trends in training volume and performance. GPS route data from rides will be visualized in QGIS to create maps tracing ride locations over time. Tableau dashboards and graphs will be used to visualize changes in ride distance, average speed, elevation, and power metrics across the seven-month period.
      The final deliverable will be presented in a Jupyter Notebook that combines methodology, narrative explanation, and visual outputs into a multip-phase cohesive computational storytelling project. The projects narrative moves from geospatial data without power metrics, to power-based structured data without geospatial data, then training records with both data types combined. The notebook will document the workflow used to transform raw TCX telemetry data into curated datasets and visual representations. The project also aims to demonstrate how digital humanities and data curation methodologies can be applied to fitness datasets rather than traditional archival or historical records.
    • Dataset: Personal cycling ride data and GPS coordinates data exported to TCX files from cycling applications (Strava and Cadence) and converted to CSV files
    • Tools:
      • OpenRefine
        For cleaning, organizing, and transforming extracted cycling ride datasets into structured CSV files suitable for mapping and visualization.
      • QGIS
        For geospatial visualization of cycling routes using GPS coordinate data extracted from TCX ride files.
      • Tableau
        For creating dashboards and visualizations showing trends in cycling metrics such as distance, speed, elevation gain, and power output over time.
      • Jupyter Notebook
        For documenting workflow, presenting narrative analysis, and organizing the final computational storytelling project.
  • Video: TBPosted
5. Popular Professions in 1911 Charlotte

  • Author: Emma LYONS
  • Abstract: Through using the 1911 Directory, I will clean up and sort data in OpenRefine to identify the most common jobs at the time. I will then analyze popular jobs based on race, marital status, or gender, if possible. Through this research, I will ask: Were the most popular professions segregated? Were particular industries segregated? Are any of these professions still prevalent today? Were these jobs specific to Charlotte? After I discern jobs of interest, I will represent the information visually to make it more digestible.
  • Dataset: Full 1911 Charlotte City Directory (16,000 entries)
  • Tools: Excel, OpenRefine, Tableau, Neo4j, and Jupyter Notebook
  • Video: TBPosted
6. Creating and Cleaning Datasets Using GenAI

  • Author: Amy Daniel
  • Abstract: This project explores what it takes to transform a dataset to resemble a cleaned up dataset from OpenRefine using Google NotebookLM. Some of the questions I’ll be exploring include: can GenAI create a tabular data set from the pages from the Charlotte Directory using a single prompt? Or how many prompts does it take to achieve the desired results? Can the prompts be applied and scaled to a larger data set? What are the significant differences between using GenAI and OpenRefine to create and clean a data set using archival records?
  • Dataset: Full 1911 Charlotte City Directory (16,000 entries)
  • Tools: Google NotebookLM, OpenRefine, and Jupyter Notebook
  • Video: TB Posted
7. Interacting with Irish-Speaking Residents of Knocktoosh, Ireland: an exploration of the 1926 Census for the Boola District Electoral Division (DED)

  • Authors: Connor BUCKLEY, Mati KASSAYE, Ben POLLOCK
  • Abstract: The 1926 Census collected demographic data from all residents in Ireland. Collected demographic data included occupation titles, languages spoken and language proficiency, and gender identities. This project aims to explore these key demographics in the village of Knocktoosh located in Boola Townland, Co. Limerick. By highlighting occupation, language, and gender, we intend to compare and contrast the residents of Knocktoosh as individuals while also creating an in-depth snapshot of the village as it was 100 years ago.
  • Dataset: 1926 Census of Ireland entries for the 131 residents of Knocktoosh, Co. Limerick. This dataset is available through the National Archives of Ireland’s website.
  • Tools: ABBYFineReader, OpenRefine, Neo4j, Excel, and Jupyter Notebook
  • Video: TB Posted
8. Economic Dimensions of African American Neighborhoods 1911 Charlotte

  • Author: Jada YOLICH
  • Abstract: This project seeks the dimensions of life in 1911 Charlotte’s African American neighborhoods. What were the local business? What types of jobs did African Americans hold and, for the time, where these “good” jobs that set the foundations for a Black middle-class neighborhood? Did folks have to far commutes in exchange for better pay or was there sufficient employment opportunity in their neighborhoods? First, I will use OpenRefine to clean the data and create a dataset that only comprises African Americans, and then use neo4j and Tableau to gain a clearer understanding of who had what jobs and create visualization based of that understanding. I will also use OpenRefine to create a dataset of African American individuals who have home addresses listed in their entries and plot addresses in QGIS to create a map of predominately African American neighborhoods. I plan to do something similar for local business. Finally, I would like to use Python to determine this distance between people home addresses and their employment address home address and analyze how far people were from the jobs they held. That data will again be visualized in Tableau.
    There is a component of archival research in this project, particularly to determine what a “good” job is in Charlotte 1911.
  • Dataset: Full 1911 Charlotte City Directory (16,000 entries)
  • Tools: OpenRefine, Neo4j QGIS, Tableau, Jupyter Notebook, and Python
  • Video: TBPosted
9. Analyzing Migration and Population Movement Records: a Personal Exploration

  • Author: Gregory SZWARCMAN
  • Abstract: Generate OCR for family records belonging to my great-grandparents, both of whom emigrated from Italy in the early 20th century. My great-grandfather Luigi immigrated to the U.S. in 1914, just months before WWI broke out. The oldest records I have are his Italian passport, and military records circa 1913. My goal what be to extract metadata from the documents. Luigi died young in 1926, so it would be nice to be able to glean more info about his life from these documents.
  • Dataset: Italian passport and military records
  • Tools: ABBYFineReader / Google NotebookLM
  • Video: TBPosted

-Authored by Richard Marciano