Onedata is a high-performance, distributed data management system designed for global infrastructures. It provides seamless access to heterogeneous storage resources and supports diverse use cases ranging from personal data management to large-scale scientific computations. Leveraging a fully distributed architecture, Onedata facilitates the creation of hybrid cloud environments that integrate private and public cloud resources.
When deployed, the system enables users to collaborate, share, and publish data while supporting high-performance computations on distributed datasets via various interfaces, including POSIX-compliant native mounts, pyfs (Python filesystem) plugins, REST/CDMI APIs, and an S3 protocol (currently in beta).
It supports on-the-fly replication, data pre-staging, data indexing, data caching and time-dependent cache cleanup.
By supporting multiple types of storage backends, such as POSIX, S3, Ceph and OpenStack Swift, Onedata can serve as a unified virtual file system for multi-cloud environments.
The rich API enables integration with existing systems and creation of complex data pipelines serving distributed workloads.