9  Algorithm pipelines

Propagation services are supported by a number of automation pipelines.

9.0.1 General principles

  • A simple design, that favours simple static files rendered in the browser rather than complex web services.
  • Rely on Github actions for running the workflows, could also be ArgoCI.
  • Use git to maintain metadata records rather than a record service.
  • Tools like Quarto or Jupyter notebooks can render HTML reports that can be served.
  • Object storage can be used to store artifacts.

Algorithm pipelines

Components:

  • App Git Repo an ESA project repository, maintained by the project. Mostly relevant if it is used by the algorithm, for instance UDF code.
  • Algorithm Catalog a list of openEO or OGC AP based algorithms, hosted by APEx. This catalog is a generic metadata record, not the actual standard specfic process.
  • Backend Catalog list of compliant backends. Allows discovery of available backends by users.
  • Algorithm runs parquet file containing a row per run of a specific udp. Allows us to plot statistics.

9.0.2 Quality & compliance test

  • Static code analysis

9.0.2.1 Python tools

Python libraries can use pylint to compute a code quality score. Question is how APEx can show an overview of related github repositories, together with a pylint score, and potentially some other metrics? The most simple solution is probably to run pylint ourselves. APEx compliance guidelines can be used to ensure a proper project organization.

A UDP can for sure link to a source git repository, from where APEx can harvest them, to show this overview.

The test results can be rendered either by a custom javascript based webapp, or even by a Quarto dashboard.

9.0.3 Integration test

  • End-to-end test on backends that support the algorithm.
  • Runs weekly or upon changes.
  • Compares against reference output.
  • Records performance metrics.

9.0.4 Benchmark

  • specific runs to build sufficient statistics to compute a cost distribution
  • Computes cost per km² using a standardized formula such as µ + 2* sigma

9.0.5 Release pipeline

  • Releases a new version of algorithm or software
  • Tracks changes in changelog, linking to issue tracker?
  • Publishes/Deploys artifacts

9.0.6 Pipeline tools

Choose from

  • Argo CI
  • Jenkins (not foreseen in apex)
  • NiFI (not foreseen in apex)
  • dvc: https://mlops-guide.github.io/Versionamento/pipelines_dvc/