May 11, 2023: Datathon Showcase – Computational Archival Storytelling with Jupyter Notebooks

On May 11, 2023, a public event showcased original and innovative work conducted by MLIS graduate students in the INST742 class (“Implementation of Digital Curation”) at the University of Maryland iSchool. The class (designed to provide hands-on learning experiences to students, with real-world environments and examples that touch on significant areas of digital curation) concluded with a 2-week final project.  Digital Curation Implementation topics explored in INST742 included:

  • Archival Science concepts and workflows and Computational Thinking (CT)
  • Digitization management (ABBYYFineReader)
  • Cleaning & Transforming (OpenRefine)
  • Data Wrangling (Trifacta)
  • Clustering algorithms (Artificial Intelligence)
  • Text Processing through NLP and NER (GATE: General Architecture for Text Engineering)
  • Geospatial Transformations through: geocoding, geolocating, georeferencing, and vectorizing/tracing (QGIS, ArcGIS)
  • Data visualization (Tableau Storyline and Tableau Dashboard)
  • Network analysis through graph databases (Neo4j)
  • Digital Curation at scale
  • Virtual machines (UMD iSchool Virtual Computing Lab (VCL), Sandbox tools, Jupyter Notebooks)

This year, the entire 15-week class was articulated around a single collection, consisting of a sample of the 1911 Charlotte NC city directory.

“City directories are among the most important sources of information about urban areas and their inhabitants. They provide personal and professional information about a city’s residents as well as information about its business, civic, social, religious, charitable, and literary institutions.” [Library of Congress].

Speakers and topics included:

  1. Eden Hansen: Mad or Madam: Investigating an Undefined Data Term
  2. Sams Wilson: Mapping Over Time in Charlotte NC: Population, Redlining, and Urban Renewal  
  3. Bethany Greenho: Building a Bigger Picture: A Case Study of Combining  the General City and Business Directories
  4. Rosemarie Fettig: Expanding the Network: Modeling Relationships with Neo4j
  5. Valerie Sallis: Revisualizing Geographic Disparities: Examining Trends in Racial and Economic Inequality on the Streets w/o GIS
  6. Mia Steinle: Religious Life in 1911 Charlotte, NC
  7. Sarah Craig: Gender, Race, and Archival Silences
  8. Elissa Dallimore: Conceptualizing Prosperity: A Case Study Analyzing Housing through Job Types
  9. Henry Kemp: Visualizing Neighborhood Demographics
  10. Isaiah Cornfield: Race, Marriage, and Profession: Data at Scale Test Case

Presentation details and recordings:

1. Mad or Madam: Investigating an Undefined Data Term:

  • Author: Eden Hansen
  • Abstract: Investigating references of managers of brothels in the 1911 Charlotte City Directory.
  • Dataset: Full datified Directory (16,000 entries), 1911 Sanborn map, 1910 Census
  • Tools: OpenRefine, Tableau
  • Video: https://youtu.be/8fu0UrGJRfE (8′ 03″)
2. Mapping Over Time in Charlotte NC: Population, Redlining, and Urban Renewal:

  • Author: Sams Wilson
  • Abstract: Understanding correlations of urban policies over time through the plotting plotting of 1911 Historical Directory address locations, using 1911, 1937, and 1972 data.
  • Dataset: Full datified Directory (16,000 entries), Mapping Inequality (1937  HOLC redlining map), 1972 Urban Renewal Map
  • Tools: QGIS, OpenRefine, Python (Pandas)
  • Video: https://youtu.be/9UWFnz-afqU (14′ 56″)
3. Building a Bigger Picture: A Case Study of Combining  the General City and Business Directories:

  • Author: Bethany Greenho
  • Abstract: Augmenting content with business categories to understand reveal the employment landscape of Charlotte 1911.
  • Dataset: General directory (16,000 entries), and datafied Business directory,
  • Tools: ABBYFineReader OCR, OpenRefine, Tableau
  • Video: https://youtu.be/4pmFzRwO_1o (11′ 48″)
4. Expanding the Network: Modeling Relationships with Neo4j:

  • Author: Rosemarie Fettig
  • Abstract: Using a design-oriented approach to experiment with different data models for a graph database representing relationships between individuals listed in the 1911 Charlotte City Directory. 
  • Dataset: 48 record modified subset of 1911 Charlotte City Directory. 
  • Tools: Arrows.app, Microsoft Excel, Neo4j (Browser edition)
  • Video: https://youtu.be/CGJgXU0o-U8 (12′ 30″)
5. Revisualizing Geographic Disparities: Examining Trends in Racial and Economic Inequality on the Streets w/o GIS:

  • Author: Valerie Sallis
  • Abstract: Examining racial disparities in key areas of Charlotte, using faceted clustering of types of homes.
  • Dataset: Full datified Directory (16,000 entries), 1911 Sanborn map, 1910 Census
  • Tools: Excel, OpenRefine, Tableau
  • Video: https://youtu.be/hyeYfOKnFBs (9′ 57″)
6. Religious Life in 1911 Charlotte, NC:

  • Author: Mia Steinle
  • Abstract: Identifying places of workshop using regular expressions in order to reveal places of worship by denomination and race with an emphasis on archival silences.
  • Dataset: Full datified Directory (16,000 entries), 1911 Sanborn map, 1910 Census
  • Tools: OpenRefine, Tableau
  • Video: https://youtu.be/uIbvZMRW_-I (11′ 10″)
7. Gender, Race, and Archival Silences:

  • Author: Sarah Craig
  • Abstract: Data visualization and analysis of women in the historical city directory (gender, married and widow status). Inferring meaning and in the context of archival silences and ambiguity.
  • Dataset: Full datified Directory (16,000 entries)
  • Tools: OpenRefine, Tableau
  • Video: https://youtu.be/tJI28XOcMmU (12′ 40″)
8. Conceptualizing Prosperity: A Case Study Analyzing Housing through Job Types:

  • Author: Elissa Dallimore
  • Abstract: Visualizing housing types by job types to understand the employment landscape of Charlotte 1911.
  • Dataset: Full datified Directory (16,000 entries)
  • Tools: OpenRefine, Tableau
  • Video: https://youtu.be/MaD9lIYM7iY (9′ 07″)
9. Visualizing Neighborhood Demographics:

  • Author: Henry Kemp
  • Abstract: Looking at heavily populated neighborhoods of Charlotte through the geocoding of historical addresses in QGIS.
  • Dataset: Full datified Directory (16,000 entries), Sanborn maps, 1910 Census
  • Tools: OpenRefine, QGIS, Tableau
  • Video: https://youtu.be/85-qL-VBY14 (12′ 04″)
10. Race, Marriage, and Profession: Data at Scale Test Case:

  • Author: Isaiah Cornfield
  • Abstract: Visualizing correlations between race, employment, and marital status.
  • Dataset: Full datified Directory (16,000 entries)
  • Tools: OpenRefine, Tableau
  • Video: https://youtu.be/hyeYfOKnFBs (7′ 41″)

They were joined by 6 members of the IMLS-funded TALENT Network project  (educators, archivists, and technologists) and AIC Research Network.

  1. Richard Marciano: INST742 Instructor, UMD
  2. Rogers Hall: Professor & Chair, Dep. of Teaching and Learning, Vanderbilt U.
  3. Mark Conrad:ex-digital archivist at the National Archives, Advanced Information Collaboratory (AIC)
  4. Greg Jansen: Senior Research Software Architect, U. Maryland iSchool
  5. Sarah Buchanan: Associate Professor, School of Information Science & Learning Technologies, U. Missouri
  6. Mark Hedges: Professor, Chair of the Digital Humanities Dep., King’s College London

-Authored by Richard Marciano