SUMMER 2024: Students Engage w. Innovative Technologies to Explore the Future Processing of Archival Collections through spatial, graph & genAI Techniques
Credits: knowledge graph in header by Andrea Svejda (1950 Census for Asheville ED58: household heads with jobs by street)
Over a 6-week period, nineteen students engaged in a series of single-credit experimental summer courses that explored the future of library and archives collection processing:
- INST608A focused on spatial representation and analysis
- INST608C focused on network representation and analysis
- INST608G focused on generative AI and Large Language Models (LLMs)
- MLIS (14): Jason Benner, Chelsea Clarke, Meg Fletcher, Etana Laing, Hope Lomvardias, Jasper Nash, Leland Sampson, Tiffany Porter, Nick de Raet, Andrea Svejda, Tahura Turabi, Matthew Turner, Ady Weng, Lauren White
- MIM (1): Yasmin Bromir
- HCIM (2): Sarah Engleman, Andrea Tavakol
- Doctoral (2): Imdad Baloch (International Education Policy program @ the College of Education), Liang Zhou (Department of Agricultural and Resource Economics @ College of Agriculture and Natural Resources)
Students spanned all three courses with 15 in the G course (genAI), 12 in the C course (network), and 11 in the A course (spatial). The nature of having three overlapping cohorts taking all three, two, or single courses made this a challenging but valuable teaching experience!
Overview:
- A spatial: was about putting people and places extracted from archival collections on a map. Students experimented with geocoding, geolocating, georeferencing, vectorization, and spatial analysis. Tools used included open source QGIS.
- C network: was about exploring graphs, a powerful way at looking at relationships between people, places, and events found in library and archival collections. Students experimented with graph data modeling, knowledge map building, graph querying, graph visualization, graphs at scale, and graph algorithms. Tools used included NoSQL Neo4j.
- G genAI: was about harnessing advances in generative AI and Large Language Models (LLMs). Students experimented with prompt engineering, advanced data analysis, image analysis, and retrieval augmented generation (RAGs). Tools used included Google NotebookLM and OpenAI ChatGPT4.o.
To tie these courses together, students used cultural content from the Asheville N.C. Urban Renewal Project (https://www.youtube.com/watch?v=kbjZTA0r5V8), including data that was recently contributed to the Asheville Racial Reparations Commission.
What was remarkable was that half-way through these summer courses, on June 17, 2024, the Reparations Commission voted on a cash settlement amount in the value of $148K to families/businesses negatively impacted by Urban Renewal (for Property Value loss) – essentially harmed by the displacement caused by Urban Renewal. See proposal text at: https://drive.google.com/file/d/13ufHIjS5DAkuJLY1_YtYJvw70t93g-Ej/view.
This adopted recommendation, which made history in Asheville (see: video of the vote: https://youtu.be/38xrxZ5haMU), referenced a preliminary list of the names of individuals and businesses in the Southside neighborhood impacted and eligible for cash payments. Students using innovative techniques in these classes including genAI content extraction, graph generation & querying, and spatial analysis, were able to identify some 19 additional businesses that had not been recorded before.
Sample Analytics:
Feedback (based on hands-on exploration of innovative technologies using social justice reparations data):
- Imdad Baloch: The course material not only helped me bridge the gap between theory and real-world application but also enabled me to delve into the concept of Generative AI and various LLMs through collaborative learning with peers and in-depth study of course materials. I gained valuable insights and discovered effective strategies for incorporating these concepts into my learning and research practices. The hands-on experience of analyzing archival data using different LLMs was particularly enlightening, broadening my perspective on leveraging textual information for learning and advancement.
- Jason Benner: I have learned a lot of interesting techniques to streamline processing and visualizing of data. I think that applying these techniques to the Asheville data was great because we were working on a real-world dataset and that provided us with an idea of the actual challenges around using this technology.
- Yasmin Bromir: Overall, I think graph-based approaches have some interesting applications, especially in the library and archive setting.
- Chelsea Clarke: Utilizing the Asheville dataset to explore spatial and network representation and generative AI for use in archives and libraries was a productive approach that enabled me to visualize meaningful ways these technologies can be applied to archives and libraries.
- Sarah Engleman: Using the Asheville data was both interesting and gave me a different perspective on how to use AI with data.
- Meg Fletcher: I really appreciated that all three courses generally used the same data each week and were all centered on the Asheville data, I think it helped make the three classes feel more linked given the separate tools being used in each.
- Etana Laing: If this is to be a semester-long course, I think it should be taught in person or in synchronous virtual sessions. I also explored some of the dark sides of GenAI and LLMs, primarily about the disaster they are to our environment.
- Hope Lomvardias: Through this course, I observed firsthand how today’s AI landscape is an exciting but perilous area to explore. Experimenting with genAI tools with urban renewal and 1960s Asheville, North Carolina revealed how difficult it is to trust AI-generated results.
- Jasper Nash: Overall, this was a great experience for learning about how spatial and network representations can be meaningfully applied to human-centered, archival contexts.
- Tiffany Porter: I really liked working with the Asheville data across all three courses. I think it made the projects productive and more engaging, knowing more about where the information came from and its purpose and tying them across each week and class, building on the others.
- Nick de Raet: I am very grateful for the opportunity to have contributed, in a small way, to the work being done in Asheville. I am hopeful (and thankful) that our work may be of some help to the Asheville community, and perhaps to other communities as well.
- Lee Sampson: I think I have learned a great deal from this class. It is a tremendous challenge to structure a class around an emerging technology. Most courses have easy to articulate learning objectives. That’s not really possible in a course on GenAI because we don’t know yet where the technology will end up.
- Andrea Svejda: All three classes were fascinating and challenging and I resonated most with the INST608A spatial class.
- Andrea Tavakol: Working with both the Asheville data set and the 1950 census data set was a powerful way to interact with and explore not so distant history. It was also interesting to come up with our own questions to answer with the data to then craft together queries. I personally enjoy learning by doing, as well as using the journey as a means to grasp concepts.
- Tahura Turabi: The three courses on computational tools offered this semester were very valuable to me as a student interested in the Digital Humanities. Overall, I believe these three courses have provided me with a deeper understanding of computational tools used for archives as well as their potential for refinement and development.
- Matthew Turner: Using the Asheville data was an excellent way for me to see the potential of these tools. I found the real-life application to be motivating and make the course material make more sense in context. The course has shown me the potential value of AI to community collections.
- Ady Weng: Overall, I believe that it was superbly useful to simultaneously experiment with the Asheville data in all three classes! The exigency and significance of the Asheville data, as conveyed to us through the introduction to the Urban Renewal Impact project in the first week, also made much of the work we did more meaningful. Priscilla Robinson’s walkthrough of her life in Southside has remained at the forefront of my mind throughout these experimentations and what they mean to individuals who rely on them for a sense of what has been lost — and what might be regained, as with reparations.
- Lauren White: The work was all the more meaningful because it centered around the Asheville urban renewal legacies; tying the tools we were learning each week to the current work being done by the Reparations Commission was a really immersive exposure to use cases for these types of tools. Not only did the research and analysis put the people first by pulling them from the records and into the tools, but the work of the urban renewal collaborators engaged with members of the impacted community.
- Liang Zhou: I really enjoyed the journey in the INST608A spatial class as it introduced me to utilizing archival records to make historical information come to life.
Lee Sampson concludes that these single-credit experimental summer courses have uncovered new paths for future courses and elicited emerging themes, and makes the following observations:
- GenAI tools require that appropriate guardrails be put on their output.
- GenAI can be a great teacher, when explaining what code it is producing. Using GenAI to learn Python has been much more effective than trying to learn Python on my own.
- Everyone takes different approaches to prompting, and the variations in results has been extremely useful to learn from.
-Authored by Richard Marciano (with input from students)
P.S.: Special thanks to Lori Perine, Rajesh Kumar Gnanasekaran, and Mark Conrad for co-advising and supporting these courses.




















