Dissertation Title Representing Born-Digital Literary Archives: from the Filesystem to the Knowledge Graph
Abstract The growing production of born-digital archives by contemporary authors generates complex documentary ecosystems that challenge traditional archival descriptive tools. Although the international archival community is increasingly oriented towards Linked Open Data (LOD), semantic models capable of addressing the specificities of literary born-digital materials remain lacking. This research proposes an approach to the representation of born-digital literary archives articulated across four levels: the analysis of authorial practices; the design of a semantic model; the implementation of automated description workflows; and the development of tools for analysis and access.
The phenomenological investigation involved fifty finalists of the Premio Strega and Premio Campiello literary awards, as well as an in-depth examination of the Valerio Evangelisti Archive – the most extensive Italian case study documented to date. The inquiry revealed recurring management patterns, strategies of digital self-representation, and forms of "archival will," alongside a generally limited awareness of documentary value among the authors surveyed. From this empirical analysis, five modelling requirements were identified: the representation of the physical, logical, and conceptual layering of digital materials; the integration of cryptographic integrity measures; the representation of native metadata; and the documentation of provenance and contextual relationships.
On these bases, the Born-Digital Ontology (BoDi) was developed as an extension of the Records in Contexts Ontology (RiC-O). An automated five-phase workflow was subsequently implemented to convert born-digital archives into RDF graphs compliant with BoDi. When tested on the Evangelisti Archive, the process generated over 60 million triples, making explicit the structures, metadata, and relationships needed to transform an extensive and opaque documentary corpus into a formalised knowledge base. Advanced querying through SPARQL and visualisation via a dedicated application enabled both sophisticated interrogation and accessible exploration at multiple levels of engagement.
The convergence of phenomenological inquiry, ontological modelling, automation, and visualisation delineates an integrated and replicable methodological framework for the representation and enhancement of born-digital literary archives as part of contemporary cultural heritage.
Supervisor Francesca Tomasi
Co-supervisors Paolo Bonora and Paola Italia
Keywords Born-Digital Archives; Literary Archives; BoDi; Born-Digital Ontology; Archival Representation; Archival Description; Semantic Web; Linked Open Data; Records in Contexts; Born-Digital Heritage; Valerio Evangelisti
Arcangelo Massari
Dissertation Title HERITRACE: enabling domain expert participation in semantic data curation with integrated provenance and change tracking
Abstract Cultural heritage institutions increasingly adopt Semantic Web technologies for FAIR compliance. In cultural heritage, semantic data is inherently interpretative and requires human curation, yet technical complexity prevents domain experts from contributing. This thesis addresses the usability gap through two research questions: RQ1 asks how to design interfaces enabling domain experts to curate RDF data without requiring technical expertise while maintaining provenance and change tracking; RQ2 asks how technical staff can perform one-time configuration of curation environments for specialized domains.
Two case studies serve distinct roles: OpenCitations Meta validates that barriers exist at scale in systems processing 124 million entities; ParaText serves as testbed for guerrilla testing. These reveal five convergent requirements: provenance management, change tracking, usability for both domain experts and configurators, flexible customization, and integration with existing RDF collections.
HERITRACE addresses these requirements through a framework built on the OpenCitations Data Model for provenance and change tracking, with the Time Agnostic Library enabling reconstruction of past entity states. Technicians use familiar SHACL shapes and YAML rules to configure entity types, validation constraints, and display settings; the framework then generates interfaces from these configurations. End users create, modify, delete, and merge entities and restore previous states through version history, without awareness of the underlying RDF infrastructure. Evaluation with 9 end users and 10 technicians combines quantitative measures and grounded theory analysis. For RQ1, end users completed curation tasks with 67% to 100% success and above-average usability (SUS 78.9). For RQ2, technicians achieved 90% success with excellent usability (SUS 83.8).
Since most institutions maintain collections in relational databases, preliminary research explores extending RML to enable inverse transformations: RDF serves as a lingua franca, with HERITRACE curating semantic data while inverse mappings transfer modifications back to original databases, allowing institutions to adopt FAIR principles without changing their existing infrastructure.
Supervisor Silvio Peroni
Co-supervisor Anastasia Dimou
Keywords FAIR; Usability; Provenance; Change-Tracking; Cultural Heritage