Built on more than a decade of experience, Rucio serves the data needs of modern scientific experiments. Large amounts of data, countless numbers of files, heterogeneous storage systems, globally distributed data centres, monitoring, and analytics. All coming together in a modular solution to fit your needs. Rucio provides a service that manages data locality. It provides a scalable solution for managing the dynamic locality of files in a heterogeneous, federated storage Datalake.
When deployed, the Rucio software provides a service that allows a group of researchers to manage non-trivial amounts of data. For any given file, dataset, collection of files, or container (a collection of files and datasets), it provides information on where that data is currently available. It also supports dynamic, time-limited data placement, with data being made available for some period (e.g., to support some computational workflow). It optimises the use of available storage by operating a cache, assuming that data that was previously used is more likely to be needed in the future. Desired data locality is expressed in terms of declarative rules. These rules may be applied both to existing datasets and anticipated, future data.