15 ESA Project Results Repository
The ESA Project Results Repository (PRR) is a key initiative designed to ensure the long-term accessibility and preservation of project results generated within ESA EOP-S projects, such as those supported by the APEx and EarthCODE initiatives. This document outlines the general approach, design, and operational procedures for the ESA PRR, detailing how it will support the storage, accessibility, and dissemination of project results.
15.1 Main Functions of the PRR
The PRR offers two core functionalities:
- Data Hosting: Ensures that project results are stored and maintained in a way that is accessible for processing on other platforms.
- Catalogue: Provides a system to find the stored data and understand its relationships with other datasets.
Figure 15.1 provides an overview of the ESA PRR infrastructure and its connection to APEx.
At the time of writing it is unclear how the infrastructure will be shared with other initiatives, such as EarthCODE. Additional details on these unresolved questions can be found in the challenges section.
15.2 Data Hosting
The project results will be stored within the ESA Cloud, a private cloud environment managed by ESA. The cloud provides Infrastructure as a Service (IaaS) resources, with storage currently limited to NFS. While NFS storage is not inherently accessible via the public internet, potential solutions, such as MinIO, are being considered to allow data access through HTTP links. This increases the complexity of the solution as most cloud alternatives to provide alternatives, such as S3.
15.3 Catalogue
Each initiative utilizing the ESA PRR, such as APEx and EarthCODE, is expected to establish its own STAC -based catalogue. APEx, for instance, will leverage its existing product catalogue instantiation service. The APEx catalogue will provide access to the data on the ESA PRR to ensure that project results are accessible through its catalogue.
For the second part of the ESA PRR solution, it is expected that each of the initiatives will set up their own STAC based catalogue. APEx will reuse its own product catalogue instantation service to achieve this goal. The APEx catalogue will also ingest metadata from the ESA PRR to ensure results are available in the APEx catalogue.
15.4 User/Data flow
The integration process for project results into the ESA PRR and their subsequent availability in the APEx catalogue can be outlined in two scenarios.
15.4.1 Minimum flow
The Minimum Flow represents the most basic setup required to support the data ingestion process into the ESA PRR. This approach focuses on essential steps to ensure that project results are securely stored and made accessible for future use without additional layers of functionality or automation.
- Permission Request: The project requests permission to APEx and ESA to upload results to the ESA PRR.
- Staging Area Setup: APEx creates a staging area (ESA Cloud/OTC) and provides credentials.
- Data Upload: The project uploads the results, notifying APEx upon completion.
- Data Check: APEx verifies the data and metadata.
- Archiving: Approved data is archived in the PRR for long-term storage.
- Catalogue Ingestion: STAC elements are ingested into the APEx catalogue.
- Staging Area Removal: The staging area is removed after successful archiving.
15.4.2 Full flow
The Full Flow offers a comprehensive end-to-end solution that leverages the full range of APEx capabilities. This approach integrates various advanced functionalities provided by APEx, such as the usage of workspaces.The full scenario not only facilitates the ingestion of project results into the PRR but also ensures their optimal use within the broader APEx ecosystem.
- Workspace Request: The project requests a workspace creation via APEx.
- Approval & Creation: After ESA’s approval, APEx creates the workspace.
- Data Upload: The project uploads its results to the workspace.
- STAC Registration: The project creates and registers the STAC elements in the workspace.
- Final Validation: After validation by the project, APEx is notified for archival.
- Archiving: APEx archives the results into the PRR for long-term storage.
- Ingestion into Catalogue: APEx ingests the STAC elements into its general catalogue.
- Post-Project: Upon project completion, the workspace is deleted, but the data and catalogue entries persist.
15.5 Challenges
The outlined solution presents several challenges:
- User Access: With separate catalogues for EarthCODE and APEx, users may be uncertain about which catalogue to access.
- Technical Alignment: Since both initiatives use the ESA Cloud for storage, there is a need for alignment on the technical setup of the PRR storage. This includes:
- Deciding whether storage and VMs should be separated for each initiative or shared.
- Aligning other technical aspects if components are shared, such as user authentication and quality control.
- Clearly defining responsibilities, possibly considering third-party management.
- Ingestion Procedures: Consistent ingestion procedures must be defined across all initiatives using the PRR.
- Responsibility Division: There needs to be a clear division of responsibilities regarding:
- Ingestion of data into the PRR.
- Maintenance and monitoring of the ESA Cloud environment.
- Performance optimization and benchmarking of the ESA Cloud environment.
- Catalogue Management: If a general catalog, in addition to the initiative-specific ones, is needed for the PRR, the responsibility for its management and monitoring must be clearly defined.