Updated 17/03/2025
DTE Infrastructure Component

Onedata S3

Federated Data Infrastructure
Image

Description

A high-performance, distributed data management system designed for global infrastructures

Onedata is a high-performance, distributed data management system designed for global infrastructures. It provides seamless access to heterogeneous storage resources and supports diverse use cases ranging from personal data management to large-scale scientific computations. Leveraging a fully distributed architecture, Onedata facilitates the creation of hybrid cloud environments that integrate private and public cloud resources.

When deployed, the system enables users to collaborate, share, and publish data while supporting high-performance computations on distributed datasets via various interfaces, including POSIX-compliant native mounts, pyfs (Python filesystem) plugins, REST/CDMI APIs, and an S3 protocol (currently in beta).

It supports on-the-fly replication, data pre-staging, data indexing, data caching and time-dependent cache cleanup.

By supporting multiple types of storage backends, such as POSIX, S3, Ceph and OpenStack Swift, Onedata can serve as a unified virtual file system for multi-cloud environments.

The rich API enables integration with existing systems and creation of complex data pipelines serving distributed workloads.

Target Audience
+

In the context of the interTwin Datalake, all users who need access to datasets available via Onedata, can access them via the Onedata S3 which is in turn used by Rucio.

License
+

Apache 2.0

Created by
+

Release Notes

This release represents the final release of the interTwin federated data management solution.

There are two external software components: FTS and Rucio.  They are fully established projects, independent of the interTwin project.  The software is production-ready, at TRL 9, and hardened with many years of production-critical use.  Both projects have multiple deployments of their software, operated by different communities.

The ALISE software is currently in a development phase, under the aegis of interTwin.  At the time of release, ALISE is TRL 4.  The user-facing functionality of ALISE is mostly feature-complete; however, anticipated changes to the API imply that the necessary integration work (whereby a service uses ALISE to identify a user) should be considered experimental. Feedback from early adopters is encouraged, but any plans to deploy ALISE should be tempered by the anticipated changes to the API.

The teapot software has also been developed within the interTwin project.  With this release, teapot is now TRL 6–7 and supports data transfer requirements of multiple, concurrent users. The per-user WebDAV instance management is automated, starting new services on demand, and terminating them if there is sufficient idle time.

Finally the first version of the Onedata S3 component is released, allowing integration of Onedata technology in the interTwin federated data management solution.

Future Plans

Some further improvements are planned for teapot. This includes integrating teapot with ALISE, to support automated identity management.

For ALISE, we anticipate possible improvements and stabilisation of the service-integration API, based on experience gained from integrating ALISE into various services. In addition, we plan to add support for client authentication in future versions of ALISE. This will limit access to the identity mapping information, providing this information to authorised services only.

As work continues with integrating the Datalake with various science use-cases, limitations may be found with the various components within the Datalake.  Any such problems will be reported to the corresponding component’s development and support teams.  The members of interTwin will offer effort to fix such issues, should such capacity become available during the project’s remaining lifetime.