Combining FAIR principles and long-term archival of 3D data

This paper presents an update of the French solution for long-term archiving combined with online publication of 3D research data in the humanities. The choice of data organisation, metadata, standards and infrastructure is in line with the FAIR principles of the semantic web. Since 2017, our consortium 3D for Humanities offers this service through the French National 3D Data Repository for Humanities. This was the start of our collaboration with CINES (Centre Informatique National de l’Enseignement Supérieur), the standard Open Archival Information System (OAIS) infrastructure for research data in France. In 2021, with more than one thousand (1000) projects registered, driven by laboratories from all over the country, open questions were posed to the community. In response, we have developed a new metadata schema that allows a more precise description of the research content, greater openness to other non-archaeological humanities fields, and better FAIR compliance. This metadata schema is aligned with standard vocabularies and mapped to the Europeana Data Model (EDM). Among other online features, a 3D viewer has been implemented to meet the needs of researchers and public communication. As designed, aLTAG3D, the desktop UI software we developed to help research labs create Submission Information Packages (SIPs), has adapted itself to the new schema through the XSD content.


INTRODUCTION 1.Domain specific issues
In recent decades, 3D recordings, such as 3D digitization, modeling or reconstruction have played a growing role in the humanities.In this broad field of research, the volume of 3D data has increased.The importance of this data and the trust users place in it has grown.For the majority of rescue archaeology projects, records contain scientific evidence and traces of elements that may be lost forever.This work raises many expectations in the French community.Longterm archiving of completed projects is currently mainly carried out locally by the laboratories, using the -sometimes poor -IT resources available.
The Huma-Num consortium "3D for humanities" aims at federating the community of 3D producers and users in the context of Human and Social Sciences.The consortium in based on the network of collaborators coming from more than 20 laboratories and research team in France.It involves national infrastructures such as Huma-Num and C.I.N.E.S. (Centre Informatique National de l'Enseignement Supérieur).It is providing recommendation from the creation, up to the publication and archiving of such data.It is developing the national 3D data repository for Human Sciences together with the required metadata scheme and a companion software to simplify the documentation process.It aims at implementing FAIR principles [Wilkinson et al. 2016].

Proposal
Since 2017, a fully integrated solution for the long-term archiving and online publication of 3D data in the humanities has been available to researchers in France.The service is divided into a web interface called "The French National 3D Data Repository for Humanities" (CND3D in French -https://3d.humanities.science)for publication purposes and a desktop software called a Long Term Archive Generator for 3D (aLTAG3D -https://altag3d.huma-num.fr)for archive package creation.During 5 years of successful service, the provenance of the 3D data has gradually broadened, initially focusing on archaeological issues, then opening up to cultural heritage elements in general, and more recently to all areas of the humanities: psychology, theatre, literature, etc.To accommodate these new types of data and their description, domain-specific metadata, data production processes and data types have been updated.The absence of any physical source object creates a major conceptual gap between heritage fields and these new disciplines.Figure 1 describes the macro process that is nowadays used by all the French communities when digitizing, modeling, manipulating virtual heritage objects.
In this article, we present a new metadata schema, a more complete and exhaustive documentation of this schema, a mapping of this schema to the Europeana Data Model (EDM), and the integration of these new descriptors into our two existing tools: CND3D, our online platform, and aLTAG3D, our desktop tool.

Scope of application
Our proposal applies only to completed and closed projects.The archive can be built up over time with other tools (like ArcheoGrid for example), but we only process data once its content has stabilized.Figure 2 shows the presented tools aLTAG3D and CND3D in the data lifecycle.Documentation of unfinished 3D models is possible, but it is advisable to limit long-term preservation and documentation to completed models in order to limit storage space to scientifically validated data.

RELATED WORKS 2.1 Metadata schema
Since the 2000's, numerous vocabularies have been developed, sometimes at a high level, sometimes more specific to the fields of archaeology or cultural heritage.CIDOC-CRM [Doerr 2005] is the reference for the description of data in these fields.Vocabularies dedicated to cultural heritage or other disciplinary fields (psychology, theatre, etc.) complement our work.These vocabularies provide a precise description of the content in terms of concepts and relationships between these concepts.These descriptors and their associated semantics are not included in our proposal.However, CIDOC-CRM, which is very complete, covers our entire schema on a top level.A mapping to CIDOC-CRM and DublinCore ensures strong interoperability with our records.Specialized schemas for the description of 3D data have also been published.At the crossroads of the humanities and 3D, CARARE [D' Andrea and Fernie 2013b], from the 3D ICONS project [D' Andrea and Fernie 2013a], offers a very flexible and lightweight metadata schema, widely recognized at European level.CARARE is a bundle of other standards (EDM, MIDAS, LIDO, DublinCore).This schema is too light for our archiving constraints based on the file content.Nevertheless, inspired by CARARE, the high-level and administrative parts of our schema overlap with its requirements.Thus, and similar to [Niccolucci et al. 2022], our schema remains interoperable through a mapping.In addition, the Europeana Data Model (EDM) has inspired the architecture of our schema with core and contextual classes.
Mapping to reference schemes for disciplines other than archaeology and heritage will be defined as needs arise and data is ingested, depending on the needs, uses and skills of the members of the emerging discipline of our consortium.Other cross-discipline schemas have been developed to describe the data production process [Belhajjame et al. 2013;Dudek et al. 2015;Homburg et al. 2021].These descriptors are not included in the proposed version, but we are currently working on integrating this information explicitly  into our model.Finally, we are required to fulfil some basic descriptors for the creation of the SIP at CINES [CINES, 2013].These elements are therefore included in our schema.

Tools and workflows
The Community Owned digital Preservation Tool Registry (COPTR -https://coptr.digipres.org)initiative of COST Action SEADDA (CA18128) combined with the Digital Research Infrastructure for the Arts and Humanities SSH Open MarketPlace (DARIAH SSHOM -https://marketplace.sshopencloud.eu)helped us to position our tools and digital preservation workflows among other European initiatives.Of the 594 archiving tools listed by the COPTR project and the 61 tools dedicated to archiving, only our 2 tools, aLTAG3D and CND3D, can create packages for archiving 3D data.Besides this core functionality, our aLTAG3D tool appears to be the only XML visual programming editor available.

THE PACKAGE 3.1 Architecture of the metadata schema
The metadata schema we propose is organized into 5 core classes and 3 contextual ones (see Figure 3).It is explicitly formalized in an XSD file.The 5 core classes are the administrative parts, the 3D object, the physical object, the sources, and the computed outputs.The schema is centred on the 3D object.This object may use input sources: photogrammetry, image archive, etc. and must be composed of outputs "computed and interpreted data", containing at least one 3D file.The object may represent a physical object.Shared contextual classes (complex types) "actor", "date", and "software" (and their subtypes) are used across the schema for different entries.
With the new architecture of the schema, the expression of actors, locations and dates have been deported in separate classes (Actor, Location, Date) that can be grouped together in the "event" class.These classes are mapping more easily with the EDM contextual Classes Agent, Place and TimeSpan.The new physicalObject class helps mapping data with the EDM ProvidedCHO classe with separate metadata related to the physical object.To map the EDM WebResource class, we align metadata from our 3dObject class and other information from deposit part of the schema.Concept information is provided by the use of interoperable thesaurus, PACTOLS.At last, most of the information of the Aggregation EDM class are computed from information provided by the deposit management class (dm_Rights, . . . ) as see in Figure 4.The final mapping document will soon be available online, as soon as Europeana has validated it.

Archive content
"Sources" and "computed and interpreted data" may refer to files included in the package or to external URIs maintained by other institutions.The package focuses on files (local or URI) associated with users input description.The local files in the archive package must pass the CINES eligibility test https://facile.cines.fr,and they can only be of three kinds: (1) On the one hand the raw sources: scientific evidence and sensor data (may not be 3D) (2) On the other hand, the final outputs of the project: articles, 3D reconstructions, etc. (at least one of which must be 3D).
No intermediate products are kept.(3) A third kind of file is the paradata: any contextual related documentation file: contract, scanning procedure, etc.
In the metadata, to leverage the potential of interoperability, free text user input fields are replaced where possible by either standard datatypes (e.g.xsd:date), closed lists (e.g.ISO639-2, custom one) or thesaurus standards (eg.PACTOLS (1)).

Documentation and tool integration
aLTAG3D [Dutailly et al. 2023] reads an XSD file and provides a visual programming interface to create and organize XML elements.
An XSD validation of the user input content is processed in real time.This flexibility allows the user interface to be easily updated for the new schema with very few changes on the software side.Many automated process reduces the amount of information that the researcher will have to fill in manually.In the Figure 5 red boxes indicates mandatory data in the updated metadata schema.CND3D publishes part of the package in an online format.The update of the metadata schema will affect the online platform by providing more details when available.The CND3D creates a DOI for the package and a web page with basic public information about it.It exposes the metadata via an OAI-PMH endpoint.Depending on the depositor's choice, it may display some of the package's 3D files online via its 3DHOP [M. et al. 2015] or Potree [Schütz 2016] viewer instance.CND3D stores the packages on secure servers for cold data (Huma-Num Box) and is the sole data provider for the CINES.Based on the XSD file, an exhaustive documentation of the schema has been created (not yet online in production), so to give everyone the possibility to create an SIP by their own, contribute, and create new tools or plugins for aLTAG3D.

APPLICATION
The whole process, from 3D creation to long term preservation and publication has been successfully experimented for a few years now, thanks to aLTAG3D software and CND3D platform.A schema in Figure 6 explain the complete workflow.Once integrated in the CND3D platform, documented 3D data can be harvested by other portals like OpenArchaeo http://openarchaeo.huma-num.fr/explorateur/home in France.Thanks to linked references in the new metadata schema, data can be cross referenced in CND3D and other portals of online publication.An example is exposed in Figure 7.

CONCLUSION AND PERSPECTIVES
Dedicated to the 3D data produced by the humanities in France, this new metadata schema and its ecosystem of tools meets the challenge of a greater openness to other non-archaeological humanities fields, a more detailed and unambiguous definition of research files, a better alignment in a semantic web FAIR perspective, a clearer documentation, and a better dissemination via machine-readable files (OAI-PMH endpoint plus EDM mapping) and a general public online 3D viewer.The next challenge is to port the existing records from the first schema to the new one.The advantage of opening

Figure 2 :
Figure 2: Data process and software associated.

Figure 3 :
Figure 3: The architecture of the metadata schema.

Figure 4 :
Figure 4: Detailed deposit part of metadata schema.