Conference Theme

Collections as Data / Data as Collections

Researchers and the general public are interacting with cultural memory collections in new ways, accessing and reusing both digital objects and collections descriptions in large scale inquiries which demand new perspectives, and utilise computer-driven methodologies, which may prove difficult to openly share and preserve. The work that collections professionals are doing to deliver and safeguard cultural heritage resources is also changing, prompting a reassessment of the workflows of digitisation and the development of digital tools that allow researchers to visualise and study collections at scale. As a result, we are building a new collaborative digital culture that will fundamentally re-shape both collections work and research practice. How might we envision the roles, opportunities, and challenges in delivering and preserving collections as data alongside the research outputs that are generated from them?

What is ‘Collections as Data’?

‘Collections as Data’ comes from the idea that digital information about collections (including, but not limited to: metadata records, digital files, software, code and other digital documentation) can serve as data for computationally-driven research enquiries. The Collections as Data movement encompasses a range of approaches to facilitating collection reuse in activities such as “text mining, computer vision, machine learning, artificial intelligence, data visualisation, mapping, image analysis, audio analysis and network analysis…While the specifics of how to develop, provide access to, and support the use of collections-as-data will vary, any digital material can be potentially made available as data that are amenable to computational reuse.” (Vancouver Statement on Collections as Data)

Resources are emerging to guide cultural heritage professionals in this work. The Digital Repository of Ireland recently published a set of recommendations for improving the interoperability and reuse of memory collections as data for the WorldFAIR Project, a European funded initiative aimed at improving global cooperation on FAIR (findable, accessible, interoperable and reusable) data policy and practice. The International GLAM Labs Community produced a checklist for publishing collections as data, which informed the draft workflow for producing collections as data now available in the Social Sciences and Humanities Open Cloud Marketplace (SSHOC). At the same time, a Europeana Working Group has released a template for Datasheets for Digital Cultural Heritage Datasets and a workflow was developed to assess the quality of data generated from GLAM collections using Jupyter Notebooks. But much work is still needed to support the responsible sharing, reuse, and preservation of collections as data. 

We envision ‘Data as Collections’ as an opportunity for a reciprocal voice in the Collections as Data movement, acknowledging that the evolving principles (see again, the Vancouver Principles) guiding this field of work call for participatory design and prioritise social as well as technical interoperability. Collaboration is needed between researchers, data specialists, computer scientists, and collections professionals, where c​ultural heritage professionals are recognised as partners in the research process and data outputs are essential contributions to the cultural heritage record (as per, ‘Cultural Heritage Data from a Humanities Research Perspective: A DARIAH Position Paper’). 

With ‘Data as Collections’ we also highlight/focus on the preservation implications for the data produced, shared, reused, and re-deposited as part of the research process. Should collections be stored, managed and preserved differently from the processed or analysed datasets that support research projects and papers? Is it time to consider a FAIR+ environment for collections and research, spotlighting the critical value of preservation and putting a stronger focus on a broader commitment to ethical re-sharing as called for in the CARE principles? (FAIR + Time: Preservation for a Designated Community) We also invite considerations of the social and environmental impact of preserving not just the collections, but the data outputs that emerge from these resources. 

For more information about DPASSH, please email