8th COMPUTATIONAL ARCHIVAL SCIENCE (CAS) WORKSHOP
Sunday Dec. 17, 2023

Part of: 2023 IEEE Big Data Conference (IEEE BigData 2023) http://bigdataieee.org/BigData2023/ (Sorrento, Italy) – Dec. 15-18, 2023

Sunday, Dec. 17, 2023
–all times are in Central European Time, UTC+1–
Boardroom #6


SCHEDULE:
  – 8:50 – 9:00 WELCOME
  – 9:00 – 10:00 SESSION 1: Classification & Annotation [3 talks]
               10:00 – 10:30 COFFEE BREAK
  – 10:30 – 11:30 SESSION 2: Authenticity & Trust [3 talks]
  – 11:30 – 12:30 SESSION 3: Emerging Challenges & Opportunities [3 talks]
               12:30 – 2:00 LUNCH BREAK
  – 2:00 – 3:00 2 KEYNOTE SPEAKERS — #1: G. JANSEN & #2: R. K. GNANASEKARAN
  – 3:00 – 4:00 SESSION 4: Generative AI and LLMs [3 talks]
               4:00 – 4:30 COFFEE BREAK
  – 4:30 – 5:00 1 INVITED SPEAKER (virtual) — E. FRONTONI
  – 5:00 – 5:30 SESSION 4: In Memoriam — Dr. Michael J. Kurtz
               6:00 – 9:00 BANQUET


8:50 – 9:00 WELCOME

  • Workshop Chairs:
    Victoria Lemieux 1, Richard Marciano 2, Mark Hedges 3

     1 U. British Columbia CANADA  /  2 U. Maryland USA /  3 King’s College London UK 


9:00 – 10:00 SESSION 1: Classification & Annotation

  • 9:00-9:20 #1: The Sequel: The Development of a Novel Context Capturing Method for the Functional Auto Classification of Records (S01209)
    Dr. Nathaniel Payne [School of Library, Archival, and Information Studies (iSchool) University Of British Columbia, CANADA]

    • PAPER SLIDES

      ABSTRACT: Computational archival science (CAS) provides new pathways for research. Biologists, for example, can perform scientific studies by applying AI/ML to digital biological specimen collections and explore questions that were not possible in the analog world. One such approach is the application of computational methods for specimen outlining to assist with specimen identification, morphometry, and other scientific questions. The challenge is to determine how to computationally generate and represent a specimen’s outline. The research presented in this paper addresses this challenge, through the deployment of elliptical Fourier descriptors (EFDs). The paper describes the image processing pipeline for extracting fish outlines, a key morphological feature, and representing the outlines using EFDs. In addition, our research presents the application of machine learning classification on the EFDs. The resulting dataset is well suited for a variety of machine learning-based downstream analyses, including classification by genus and species. Overall, the classification tests produced a 96.3% accuracy, demonstrating the distinguishing nature of the EFDs, and by proxy, the fish outlines as a whole. Broadly, these results indicate the effectiveness of archival specimen usage in machine learning applications, and demonstrate specimen outlining via Fourier descriptors as a computational archival science approach.

  • 9:20-9:40 #2: Specimen Outlining: A Computational Archival Science Approach (S01216 )
    David Breen, Andrew Senin, Ajani Levere, Joel Pepper, Jane Greenberg [Department of Computer Science Drexel University, Philadelphia, PA, USA / Department of Information Science Drexel University, Philadelphia, PA, USA]

    • PAPER  SLIDES

      ABSTRACT: Computational archival science (CAS) provides new pathways for research. Biologists, for example, can perform scientific studies by applying AI/ML to digital biological specimen collections and explore questions that were not possible in the analog world. One such approach is the application of computational methods for specimen outlining to assist with spec-imen identification, morphometry, and other scientific questions. The challenge is to determine how to computationally generate and represent a specimen’s outline. The research presented in this paper addresses this challenge, through the creation of elliptical Fourier descriptors (EFDs). The paper describes the image processing pipeline for extracting fish outlines, a key morphological feature, and representing the outlines using EFDs. In addition, our research presents the application of machine learning classification on the EFDs. The resulting dataset is well suited for a variety of machine learning-based downstream anal-yses, including classification by genus and species. Overall, the classification tests produced a 96.3% accuracy, demonstrating the distinguishing nature of the EFDs, and by proxy, the fish outlines as a whole. Broadly, these results indicate the effectiveness of archival specimen usage in machine learning application, and demonstrate specimen outlining via Fourier descriptors as a computational archival science approach..

  • 9:40-10:00 #3: Who’s in My Archive? An End-to-End Framework for Automatic Annotation of TV Personalities (S01206)
    Maurizio Montagnuolo, Fulvio Negro, Alberto Messina, Angelo Bruccoleri, Roberto Iacoviello [Centre for Research, Technological Innovation and Experimentation Rai Radiotelevisione Italiana Turin, ITALY]

    • PAPER SLIDES

      ABSTRACT: Knowledge about the presence of people in a video is a valuable source of information in many applications, such as video annotation, retrieval and summarisation. The contribution of this paper goes in the direction of demonstrating how AI-based face processing technologies can be profitably used to perform video annotation of television content. To validate our vision, we developed the Face Management Framework (FMF), which implements an end-to-end pipeline for face analysis and content annotation based on few-shot or zero-shot face embedding extraction models. The results of the test campaign of the system show that the key performance indicators that we defined were exceeded by a wide margin, demonstrating how media workflows could greatly benefit from the tool and the efficiency improvements it brings.


10:00 – 10:30 COFFEE BREAK


10:30 – 11:30 SESSION 2: Authenticity & Trust

  • 10:30-10:50 #4: Authenticating Citizen Journalism by Incorporating the View of Archival Diplomatics into the Verification of Open-source Investigators (S01211)
    Hoda Hamouda [School of Information (iSchool UBC) University of British Columbia, CANADA]

    • PAPER SLIDES

      ABSTRACT: Can archival science and diplomatics enhance our ability to authenticate YouTube citizen journalism videos captured in conflict-affected regions? This research explores the possibility of expanding the current process of human rights open-source investigators in verifying online videos by integrating authentication measures of archival diplomatics into the workflow of open-source investigators

  • 10:50-11:10 #5: Will Blockchain Technology Change How Well National Archives Preserve the Trustworthiness of Digital Records?: Preliminary Results of a Survey (S01205)
    Özhan Saglik, Victoria Lemieux [Bursa Uludag University, Türkiye / University of British Columbia Vancouver, CANADA]

    • PAPER SLIDES

      ABSTRACT: The purpose of this study is to examine the viewpoint of national archives on blockchain and distributed ledger technologies, discover their activities in relation to the application of these technologies, and analyse their thoughts on how these technologies can play a role in the preservation of records’ trustworthiness. A survey method was adopted in the study. The survey consisted of 18 questions about national archives’ attitude and actions in relation to application of blockchain and distributed ledger technologies. The survey was sent to the 194 national archives listed in the Directory of National Archives. Eighteen responses have been acquired which, while low, provides initial insights into how national archives are responding to these technologies. This study has three hypotheses. The first one is “blockchain technology will change archiving practices”, the second one is “the trustworthiness of digital records can be preserved better with blockchain technology”, and the last one is “national archives are reluctant to implement blockchain networks that use tradable crypto-assets”. According to the results obtained from the survey, the first hypothesis has not been verified. The second hypothesis is likely, as national archives that are keen to adopt blockchain and distributed ledger technologies, but a majority of the archives are hesitant to adopt these technologies for archiving, suggesting that the third and final hypothesis might also true, though the reasons for national archives’ reluctance to adopt these technologies could be more varied than originally hypothesized. This study is one of the first systemic analyses of the viewpoint and activities of national archives on blockchain and distributed technologies.

  • 11:10-11:30 #6: Analogous Analogues: Digital Twins and Hardware Tracking in GLAM Collections (S01208)
    Dian Ross, Edmond Cretu, Victoria Lemieux [Electrical and Computer Engineering University of British Columbia Vancouver, CANADA / School of Information University of British Columbia Vancouver, CANADA]

    • PAPERVIDEOSLIDES

      ABSTRACT: Galleries, Libraries, Archives, and Museums (GLAMs) are host to cultural treasures and historic records but face inherent challenges maintaining accessibility and traceability in their legacy collections. Rolling COVID-19 lockdowns over the past three years (2020-2023) have limited access to primary materials while user expectation of digital access to collections has grown. With renewed digital access, however, comes new challenges in authentication and provenance tracking: collection digitization and monitoring of cultural artefacts introduces new lines of work for institutions already constrained by budgets and staffing. Building upon our previous exploration of this topic, “NFTs: Tulip Mania or Digital Renaissance?”, we present a design solution for tracking and monitoring GLAM collection objects via a hardware controller with Trusted Execution Environment (TEE) that interfaces with a trusted and flexible digital twin ledger architecture, selected from our analysis of database and private ledger technologies. We conclude by outlining the physical threat model for this design: future work will expand this model to include digital (cyber) threats to GLAM collection objects and investigate credentialed queries.


11:30 – 12:30 SESSION 3: Emerging Challenges & Opportunities

  • 11:30-11:50 #7: Critical Community-Centeredness: Ethical Considerations for Computational Archival Studies (S01203)
    Madelynn Dickerson, Audra Eagle Yun [Digital Scholarship Services University of California, Irvine Libraries Irvine, USA / Special Collections & Archives University of California, Irvine Libraries Irvine, USA]

    • PAPER

      ABSTRACT: In this paper, we call for computational archival studies to prioritize social justice and community-centeredness. Our initial research findings, as well as the work of community archives, provide evidence of the need to elevate and truly center the voices of those depicted (or underrepresented) in large-scale digital archives, leveraging the power of computational thinking with the transformative experience of seeing oneself represented (or representing oneself) in digital collections.

  • 11:50-12:10 #8: Accelerating Precision Research and Resolution Through Computational Archival Science Pedagogy (S01204) 
    Sarah A. Buchanan, Jennifer L. Wachtel, Jennifer A. Stevenson [University of Missouri Columbia, USA / University of Maryland and National Archives and Records Administration, Washington, D.C., USA / Defense Threat Reduction Agency Fort Belvoir, USA]

    • PAPER VIDEO SLIDES

      ABSTRACT: Use of archival collections is accelerated by the presence of finding aids, which communicate the arrangement and description of collection contents. To arrive at the optimal arrangement of a collection, archivists rely on some item-level processing or knowledge gained by exploring and manipulating digital reproductions of the contents. In this paper we consider archival student and instructor perspectives from hands-on course experiences directly with two distinct collections: one pertaining to the development, 2017 transfer and launch, and ongoing maintenance of the International Research Portal for Records Related to Nazi-Era Cultural Property (IRP2), and one a selection of unclassified catalog entries about digitized nuclear science reports. Visualizing is a data practice that permits the discovery of key content patterns, identification of computational models to be carried out to aid further analysis, and query-resolution for subject experts with precise – and historically significant – research questions. While archival data visualizations have previously been implemented as an extension of descriptive work including finding aid element counts, here we connect visualization to the work of archival outreach and access. We study how visualizations generated by groups of students working with textual and numerical dataset portions can ultimately accelerate time-sensitive uses of collections.

  • 12:10-12:30 (virtual) #9: The Utility of Standards and Good Practice Guidelines for Records Professionals: Comparing Apples, Oranges, and Other Fruits (S01215) 
    Shadrack Katuu [University of South Africa, Pretoria, SOUTH AFRICA]

    • PAPER VIDEO SLIDES

      ABSTRACT: The perceived usefulness of standards and good practice guidelines (S&GPG) for records professionals is often seen as ambiguous. Many professionals find the abundance of options overwhelming and confusing. Even after selecting seemingly suitable S&GPG, their direct benefits may not always be evident and can potentially restrict professional autonomy in certain situations. This article explores various approaches employed by records professionals to understand the complexity of S&GPG, such as simple listing or ontological representation. However, each approach has its own set of constraints. The article proposes an initial meta-framework that draws from insights of successful frameworks to provide preliminary categories. The purpose of this conceptual proposal is to assist records professionals understand the connections between S&GPG.


12:30 – 2:00 LUNCH BREAK


2:00 – 3:00 KEYNOTE SPEAKERS: CAS in Action — case studies in computer vision and LLMs

  • Greg JANSEN: Developing Computer Vision and Machine Learning to Segment and Read Census Records  ** VIDEO
    • ABSTRACT: Greg will share his recent work on applying computer vision techniques to the digitized handwritten US Census forms from the 1950s. He will showcase the painstaking techniques often used for form segmentation and handwritten character recognition in order to create a workflow for extracting demographic information. Computer vision and machine learning require some deep technical knowledge, but draw on creative approaches and trial and error style experimentation to find a solution. Greg will guide us through his journey of setbacks and discovery as these paper images are gradually transformed into structured information.
    • BIO: Greg is a specialist in digital repositories and computational treatments for digital archives. He is part of the professional faculty at the School of Information Studies at the University of Maryland at College Park and has led diverse projects for the U.S. National Parks Service, the University of North Carolina at Chapel Hill, and the Institute for Museum and Library Services. He has contributed to many international open source software projects, including Fedora Commons, Trellis Linked Data Platform, and DRASTIC. His research interests include high scale digital platforms, computer vision, machine learning, and digital preservation.

  • Rajesh Kumar GNANASEKARAN: Conversing with the Past through Legacy of Slavery Records using OpenAI’s GPT3 LLM ** VIDEO
    • ABSTRACT: This project explores the use of AI and ML using the OpenAI GPT3 Large Language Model (LLM) to facilitate the analysis and visualization of newspaper advertisements from the Maryland State Archives related to the trading of enslaved people. The study focuses on the Domestic Traffic Ads (DTA) collection of the State of Maryland between 1824 and 1864, which exposes chattel slavery practices where buyers and sellers would interact to exchange and share human beings, often for social and domestic benefit. This case study is part of a larger project to explore computational treatments to remember the Legacy of Slavery (CT-LoS) towards reasserting erased memory. Previous studies have included computational treatments for Manumissions, Certificates of Freedom, and Runaway Slave Ads. Our approach is mindful of the social and ethical concerns that arise from using LLMs and the sensitivities related to working with slavery data. In this context, we develop a chatbot using OpenAI’s GPT-3 LLM, which we call “ChatLoS,” as a querying tool, fine-tuned to be contextually aware of the DTA dataset and demonstrate surprisingly precise and quantitative results.
    • BIO: Rajesh Kumar Gnanasekaran is a research fellow at the Advanced Information Collaboratory (AIC) and a doctoral candidate at the U. Maryland iSchool. Rajesh’s research interests are to work with culturally rich dataset collections, digitally archived or born digital. Rajesh’s focus is to explore, analyze, and apply computational treatments on these dataset collections using advanced data science-based approaches such as machine learning, artificial intelligence, natural language processing, and graph database networks to visualize the raw data that unravel narratives of the several entities involved, especially those that are not represented well in the literature. To achieve this, Rajesh collaborates with experts from interdisciplinary backgrounds to incorporate their feedback.


3:00 – 4:00 SESSION 4: Generative AI and LLMs

  • 3:00-3:20 #10: Can GPT-4 Think Computationally about Digital Archival Practices? (S01213)
    William Underwood, Joan Gage [College of Information Studies, University of Maryland, College Park, MD, USA / Paul D West Middle School, Fulton County Schools, East Point, GA, USA]

    • PAPER — SLIDES

      ABSTRACT: This paper describes an investigation of GPT-4’s knowledge in some areas of archival practice, and its ability to think computationally about archival tasks. It is demonstrated that GPT-4 has shown an understanding of ten among the twentytwo distinct forms of computational thinking. When GPT-4 is combined with plugins, it is able to apply some of these methods and tools to digital archival tasks.

  • 3:20-3:40 (virtual) #11: Exploring the Application of Large Language Models in Detecting and Protecting Personally Identifiable Information in Archival Data: A Comprehensive Study (S01207)
    Jianliang Yang, Xiya Zhang, Kai Liang, Yuenan Liu [School of Information Resource Management Renmin University of China Beijing, CHINA  / Digital Archives Management Office Hangzhou Archives Zhejiang, CHINA]

    • PAPER VIDEO

      ABSTRACT: This comprehensive study investigates the application of Large Language Models (LLMs) for detecting and protecting Personally Identifiable Information (PII) in archival data, a pressing concern for archives under the mandate to increase public access while safeguarding personal privacy. The paper juxtaposes traditional supervised learning methods against LLMs’ unsupervised capabilities in PII detection, unveiling LLMs as viable alternatives capable of achieving satisfactory performance levels without the need for extensive training datasets. Through empirical analysis, the study validates the feasibility of LLMs in identifying sensitive information within large volumes of archival material. The findings highlight LLMs’ significant interpretability, providing understandable rationale behind PII identification—a feature that not only enhances trust in AI applications but also aids archival staff in the review process. This research contributes novel insights into the intersection of AI and archival science, presenting LLMs as powerful tools for addressing the twin challenges of data accessibility and privacy.

  • 3:40-4:00 (virtual) #12: AI-Generated Images as an Emergent Record Format (S01212)
    Jessica Bushey [School of Information San José State University San José, USA]

    • PAPER VIDEO

      ABSTRACT: AI-generated Images are disrupting existing approaches to verifying the trustworthiness of visual media. The application of generative AI in fields in which images are trusted visual evidence of persons, actions and events is drawing the attention of archival scientists and AI researchers. A literature review of AI-generated images as an emergent record format, identified an absence of archival and recordkeeping knowledge. Analysis of the results revealed six thematic categories: authenticity and verifiability; manipulation and misinformation; bias and representation; attribution and intellectual property; transparency and explainability; and ethical considerations. These themes inform the development of research questions and the next phase of the study that includes the application of theory and methods of archival diplomatics and computational archival science.


4:00 – 4:30 COFFEE BREAK


4:30 – 5:00 INVITED SPEAKER (virtual): Emanuele FRONTONI — The InterPARES Trust AI Project’s PergaNet Case Study: Is Appearance-Based AI in the Future of Archival Science? ** VIDEO

    • ABSTRACT: This invited talk delves into the innovative intersection of archival science and artificial intelligence (AI), examining the potential future trajectory of archival practices through the lens of the InterPARES Trust AI Project’s PergaNet Case Study. Archival institutions globally strive to preserve records of various entities, ensuring their endurance as cultural heritage and accountable historical references. The rapid digitalization of parchment-based documents, pivotal in human communication history, has elevated their preservation and analysis to a critical research area in image and pattern recognition. PergaNet emerges as a groundbreaking solution in this domain, utilizing a lightweight deep learning (DL) system for the historical reconstruction of ancient parchments through appearance-based methodologies. This AI-driven approach is increasingly essential for the effective analysis of ancient image data. PergaNet’s core objective is the automatic processing of vast volumes of scanned parchments, a challenge not yet fully explored due to the novelty of parchment scanning technologies and the critical need for data recovery from deteriorating historical documents.
      • The talk will explore PergaNet’s tri-phasic process: classifying parchments’ recto/verso sides, detecting text, and recognizing the “signum tabellionis.” This system not only identifies and classifies objects within images but also pinpoints their precise locations. Significantly, PergaNet’s analysis relies on ordinary usage data, abstaining from any data manipulation or alteration techniques.
      • This approach has profound implications for archival science, potentially revolutionizing how archives maintain, analyze, and interpret historical documents without passing by OCR and textual analysis.
      • The session will critically assess whether appearance-based AI systems like PergaNet signify the future of archival science, offering insights into the evolving role of AI in preserving our cultural and historical legacy.
    • BIO: Emanuele Frontoni is a Full Professor of computer science with the University of Macerata and the Co-Director of the VRAI Vision Robotics & Artificial Intelligence Lab. His research interests include computer vision and artificial intelligence with applications in robotics, video analysis, human behavior analysis, extended reality and digital humanities. He is the author of over 230 international articles and collaborates with numerous national and international companies in technology transfer and innovation activities. He has been Program Chair or General Chair of various international conferences and summer schools (e.g. IEEE / ASME MESA Mechatronic Embedded System & Applications 2016 and 2017, IEEE ECMR European Conference on Mobile Robotics 2017, BigDat 2020, DeepLearn 2021) and co-organizer of many international workshops (eg DeepRetail @ ICPR 2020, D2CH @ CVPR 2021, AI4DH @ ICIAP 2022). He is also involved in several national and international technology transfer projects in the fields of AI, Deep Learning, data interoperability, cloud-based technologies, and big multimedia data analysis, extended reality and digital humanities. He is a member of the European Association for Artificial Intelligence, the European AI Alliance, and the International Association for Pattern Recognition. He served as expert for the EU Commission in the AI H2020 and Horizon Europe Calls and he is currently co-speaker of the European IPEI CIS (Important Project of Common European Interest – Cloud Infrastructure and Services) for the AI services of the next generation of European cloud – edge services.


5:00 – 5:30 SESSION 4: Discussion and Closing

  • In Memoriam one year ago… our friend and CAS collaborator Michael Kurtz:
    “One of the pulls to the bright side is our CAS initiative. Not only is it intellectually compelling to me, but I feel I am part of an endeavor that will help others in the archival space and beyond. To be even more blunt, I am so curious to see what happens next as it makes me want to push the boundaries of the time that I have left!”
Photo taken on Friday, Dec. 16, 2022.
    • Michael launched the CAS initiative in 2016, with Victoria Lemieux, Mark Hedges, Maria Esteva, William Underwood, Mark Conrad, and Richard Marciano [LINK].

    6:00 – 9:00 BANQUET


    IMPORTANT DEADLINES:

    • Monday, Nov. 6, 2023 (final): Due date for full workshop papers submission
    • Wednesday, Nov 15, 2023: Notification of paper acceptance to authors
    • Wednesday, Nov 22, 2023 (hard deadline): Camera-ready of accepted papers
    • Sunday, Dec 17, 2023: Day-long CAS workshop (in person) in Sorrento, IT
      • If you are planning on attending the workshop, please contact organizers for registration details!

    PAPER SUBMISSION:


    COMPUTATIONAL ARCHIVAL SCIENCE: digital records in the age of big data

    INTRODUCTION TO WORKSHOP [also see our CAS Portal]:

    The large-scale digitization of analogue archives, the emerging diverse forms of born-digital archive, and the new ways in which researchers across disciplines (as well as the public)wish to engage with archival material, are resulting in disruptions to transitional archival theories and practices. Increasing quantities of ‘big archival data’ present challenges for the practitioners and researchers who work with archival material, but also offer enhanced possibilities for scholarship, through the application both of computational methods and tools to the archival problem space and of archival methods and tools to computational problems such as trusted computing, as well as, more fundamentally, through the integration of computational thinking with archival thinking.


    Our working definition of Archival Computational Science (CAS) is:

      • A transdisciplinary field that integrates computational and archival theories, methods and resources, both to support the creation and preservation of reliable and authentic records/archives and to address large-scale records/archives processing, analysis, storage, and access, with aim of improving efficiency, productivity and precision, in support of recordkeeping, appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material.

    OBJECTIVES

    This workshop will explore the conjunction (and its consequences) of emerging methods and technologies around big data with archival practice (including record keeping) and new forms of analysis and historical, social, scientific, and cultural research engagement with archives.We aim to identify and evaluate current trends, requirements, and potential in these areas, to examine the new questions that they can provoke, and to help determine possible research agendas for the evolution of computational archival science in the coming years. At the same time, we will address the questions and concerns scholarship is raising about the interpretation of ‘big data’ and the uses to which it is put, in particular appraising the challenges of producing quality–meaning, knowledge and value–from quantity, tracing data and analytic provenance across complex ‘big data’ platforms and knowledge production ecosystems, and addressing data privacy issues.

    This will be the 8th workshop at IEEE Big Data addressing Computational Archival Science (CAS), following on from workshops in 2016, 2017, 2018, 2019, 2020, 2021 and 2022. It also builds on three earlier workshops on ‘Big Humanities Data’ organized by the same chairs at the 2013-2015 conferences, and more directly on a 2016 symposium held in April 2016 at the University of Maryland.

    All papers accepted for the workshop will be included in the Conference Proceedings published by the IEEE Computer Society Press. In addition to standard papers, the workshop (and the call for papers) will incorporate a student poster session for PhD and Master’s level students.


    RESEARCH TOPICS COVERED:
    Topics covered by the workshop include, but are not restricted to, the following:

      • Application of analytics to archival material, including AI, ML, text-mining, data-mining, sentiment analysis, network analysis.
      • Analytics in support of archival processing, including e-discovery, identification of personal information, appraisal, arrangement and description.
      • Scalable services for archives, including identification, preservation, metadata generation, integrity checking, normalization, reconciliation, linked data, entity extraction, anonymization and reduction.
      • New forms of archives, including Web, social media, audiovisual archives, and blockchain.
      • Cyber-infrastructures for archive-based research and for development and hosting of collections
      • Big data and archival theory and practice
      • Digital curation and preservation
      • Crowd-sourcing and archives
      • Big data and the construction of memory and identity
      • Specific big data technologies (e.g. NoSQL databases) and their applications
      • Corpora and reference collections of big archival data
      • Linked data and archives
      • Big data and provenance
      • Constructing big data research objects from archives
      • Legal and ethical issues in big data archives

    PROGRAM CHAIRS:
    Dr. Mark Hedges
    Department of Digital Humanities (DDH)
    King’s College London, UK

    Prof. Victoria Lemieux
    School of Information
    University of British Columbia, CANADA

    Prof. Richard Marciano
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA


    PROGRAM COMMITTEE MEMBERS:
    Dr. Bill Underwood
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA

    Dr. Jane Greenberg
    Alice B. Kroeger Professor and Director, Metadata Research Center
    College of Computing & Informatics

    Drexel University, USA

    Mark Conrad
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA

    Gregory Jansen
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA

    Rajesh Kumar Gnanasekaran
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA

    Lori Perine
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA

    Jennifer Proctor
    Advanced Information Collaboratory (AIC)
    College of Information Studies
    University of Maryland, USA