8  Algorithm Hosting

Algorithm hosting in the frame of APEx is about maintaining a curated catalog of algorithms, that can be executed on one or more APEx compliant processing platforms. APEx itself is not a processing platform.

The result is that processing platforms can continue to innovate and improve their services, to try and attract ESA projects, or to try to support algorithms that already run on other processing platforms.

8.1 Hosting mechanisms

8.1.1 openEO UDP

An openEO UDP is essentially a JSON document. There currently exists a few ways to host them:

8.1.1.1 openEO UDP API

https://api.openeo.org/#tag/User-Defined-Processes/operation/list-custom-processes

This is how an openEO backend can expose supported UDP’s. This is useful, but we may additionally want to build a backend independent, centralized repository of UDP’s.

8.1.1.2 Git based catalog

UDP json files can be stored and tracked in a git repository. This then serves 2 purposes: versioning and distribution.

To make these UDP’s executable without first storing them on a specific backend, an extension of the openEO API is needed:

https://github.com/Open-EO/openeo-api/issues/515

With this option, it is still required to somehow link the UDP to the backends that support it. This is somewhat supported by federated setups like openEO platform, but it is not impossible that a UDP depends on a backend that is not part of a federation.

We propose to have one or more repositories for published algorithms under the APEx GitHub organization. Requirement: Source code needs to be under control of ESA/APEx!

These git based algorithms are then registered in algorithm hosting portal, via UI. The portal exposes these algorithms also via API??

8.1.1.3 External UDP dependencies

UDP’s are only json files, so they may have external dependencies. Some examples:

  • Code for UDF’s
  • ML Model binaries
  • Auxiliary raster/vector data loaded via load_stac, load_url

All of these can be served over HTTP, so can be managed by the binary artifacts component discussed in 6.4.

8.1.1.4 UDP limitations & improvements

  • Sometimes parameter values require more complex preprocessing then what can be easily achieved with predefined processes.
  • Sometimes load_collection parameters need adjustment based on available data in the (parametrized) AOI.
  • Complex postprocessing
  • UDP’s do not yet describe the resulting datacube and its layout.

8.1.2 OGC AP Best Practice

https://terradue.github.io/ogc-eo-application-package-hands-on

An application package is a CWL file, which is a simple text file, and has Docker images as dependencies.

Application packages can be deployed on an OGC Processes compatible service, to be executable as a service.

8.2 Joint openEO/OGC AP hosting

Depends on EOEPCA.

Either hosting can talk to two standards, or there’s an API that implements one standard and forwards requests accordingly.

https://github.com/Open-EO/openeo-api/blob/f6fcb93585261508a4c8720fd9355d8f9c85961c/crosswalks/ogcapi-processes.md

Both the OGC API Processes and openEO implementations that we have available, could in fact be made to invoke something of the other variant. For instance: a CWL file could invoke an openEO UDP, or a UDP could invoke a CWL process.

We may want to figure out which is the primary API that our own frontend wants to talk to, but could eventually also support both options. The crosswalk may help here.

8.3 Accounting & NoR integration

APEx algorithm hosting has some important requirements:

  • Offer processes with fixed cost per km²
  • Allow NoR packages with credits for specific algorithms.
  • Map users to APEx service accounts?

8.4 Output standardization

When an algorithm generates a result, that result may also need further handling, so needs to be standardized.

OpenEO has quite rich output metadata specified by the standard, but need to find something similar for OGC Processes. It seems that it can be anything, but it’s a bit less standardized.

8.5 NoR Onboarding

Algorithms are onboarded in NoR. By describing algorithms as code in a public github repo, the NoR could perhaps considering ingesting this automatically?

8.6 Hosted Algorithm Catalogue

8.6.1 Description

All APEx-compatible algorithms will be promoted through the APEx Algorithm Catalogue. This catalogue provides a comprehensive overview of all onboarded algorithms and workflows, featuring robust filtering and search capabilities. Its online user interface offers users, projects or organizations a simple and intuitive onboarding process to share their algorithms and workflows. The catalogue is able to support multiple technologies, such as openEO UDP and OGC AP.

Leveraging the existing capabilities of APEx-compliant platforms, users can:

  • Provide comprehensive service descriptions
  • Define code examples
  • Link their service to an external user interface supporting service execution

This concept has already proven its utility by being deployed on the following platforms:

8.6.2 Integration with APEx

Other APEx components will use this catalogue as the central repository to retrieve algorithms. An OGC API Records interface will enable the algorithms to be discovered in a machine-readable manner, facilitating seamless integration and interoperability across various APEx services and components.

In EOEPCA, this would be the ‘Executable service’ resource type:

https://eoepca.readthedocs.io/projects/architecture/en/latest/reference-architecture/resource-management-concepts/#resource-types

An example of an OGC Record describing an algorithm:

https://github.com/ESA-APEx/apex_algorithms/blob/main/algorithm_catalog/worldcereal.json

algorithm_type: CWL|openeo_udp
algorithm_url: link to CWL file or UDP json or deployed process?
description: link to description markdown file
supported_services: list of links to services that are known to run this algorithm
algorithm_cost: cost per square km
algorithm_owner: how to link this to person or organization?

8.6.3 Architecture

The algorithm catalogue provides an extensive solution with many valuable features, including dedicated group management, accounting, billing, and usage reporting. This solution is built using a microservices architecture, allowing for the toggling of features based on need. Figure 8.1 shows the architecture of the complete solution. Initially, APEx will offer only the algorithm catalogue. However, depending on the evolving needs of APEx, more features can be enabled. Therefore, it is essential to consider the full solution in the overall architecture setup.

Figure 8.1: Algorithm Catalogue - Full Architecture

8.6.4 Dependencies on shared components

The APEx Algorithm Catalogue relies on several components for its correct functioning. Figure 8.2 outlines the dependencies of the various microservices on shared functional components. Some dependencies, such as reporting and log aggregation, apply to all microservices.

  • Log Aggregation
  • Observability
  • Monitoring
  • DNS configuration
  • Identity and Access Management
  • API Gateway / reverse proxy
  • STMP Server
Note

The PSP component in Figure 8.2 represents an external payment gateway to handle financial transactions. This is considered out of scope of the APEx project.

Figure 8.2: Algorithm Catalogue - Full Architecture