Updated 14/02/2024
Core DTE Modules

yProv

Image

Description

An open-source service to support provenance management within scientific workflows.

yProv is an open-source software ecosystem to support provenance management within scientific workflows. It relies on the W3C PROV family of standards, a RESTful interface and a graph database back-end based on Neo4J. The yProv web service (main component) is implemented in Python by using the Flask micro-framework which is based on the Jinja2 Template Engine and Werkzeug WSGI Toolkit. The service is domain-agnostic, though its primary case studies in the project come from the climate change domain (i.e. climate analytics workflows). The service aims at implementing the micro-provenance concept, to navigate within the provenance space across different dimensions (e.g., horizontal & vertical). yProv includes also the Command Line Interface and additionally, it delivers support for provenance tracking in AI, which adds extra capabilities in key and recurring use cases across different DTs.

Users can exploit the yProv service to manage (i.e. store, retrieve, explore, visualise) the provenance information associated with scientific datasets, thus getting a better understanding about specific datasets. The value proposition is about (i) stronger traceability, transparency, and trust (through a richer set of metadata) and (ii) multidimensional exploration/navigation of provenance metadata information (i.e., multi-level).

Target Audience
+

Scientific users, both producers and consumers of datasets. End users can interact via the yProv RESTful API to manage (i.e., CRUD operations) the provenance information.

License
+

GPLv3

Created by
+

Release Notes

yProv has been adopted in InterTwin to implement provenance support within scientific workflows, starting from some case studies identified in the environmental domains (i.e. climate data analytics workflows). Concerning the initial design, focusing on the core service, yProv delivers a software ecosystem which includes a service, libraries and tools. The current release has been integrated:

  • in climate-related DTs for extreme events both at CMCC and UNITN;
  • with itwinai (in collaboration with CERN);
  • with IM for automated cloud deployment over Kubernetes clusters (in collaboration with UPV);
  • with the SQAaaS platform through the available API (in collaboration with CSIC).

Future Plans

yProv will evolve, during interTwin, to accommodate additional requirements. Ongoing and future activities include: (i) the integration of provenance tracking in openEO (now early stage); (ii) a new release of yProv fixing minor bugs that will be discovered; (iii) provenance support in the SQA process; (iv) stronger integration of metrics within the yProv libraries; (v) a first release of the yProv explorer, which will offer a graphical UI to navigate and inspect provenance documents; and finally, the integration with additional DTs.