One of the innovative aspects of the BiobankCloud PaaS is the capability of interconnect several PaaS deployments in a cloud federation. This enables easy-to-use data sharing and allows the use of public clouds (e.g., Amazon S3, Azure Blob Storage) for storing data. This federation, dubbed Overbank, is implemented through a novel cloud-backed storage system called Charon. Furthermore, we want to give authorized bioinformaticians a “dropbox-like” experience when accessing biobanks datasets stored in Charon.

The design of Charon exploits recent innovations in cloud-of-clouds storage systems. The main idea is to store the files in multiple independent cloud storage services, maintain the data available and secure even if some of these cloud services are subject to faults and security incidents.

There is a number of characteristics that make Charon different from any system currently available:

  • A truly servless design: to minimize the operational effort required for maintaining the shared infrastructure by implementing the whole system at the client-side, relying only on widely available cloud services for storing data and executing coordination;
  • Dependable metadata storage: all file system and biobank-specific metadata are stored in a cloud-of-clouds to make them widely available to authorized users despite possible failures on communication and cloud providers;
  • Flexible data location: due to the legal, performance and criticality constraints, shareable data must be stored either in the edges (file system clients), in a single cloud or even in a cloud-of-clouds;
  • Efficient read/write and read/read sharing: Charon must be as efficient as possible when reading data created by others and, at the same time, consistency issues and write-write conflicts must be managed automatically by the infrastructure.

Charon relies only on cloud storage services such as Amazon S3, without requiring any dedicated process other than the ones running on the file system clients (i.e., biobank servers and bioinformaticians desktops). Furthermore, the system manages file metadata and data in different ways. The former is encapsulated in namespace containers (shared or private, depending on the directory sub-tree visibility) that are stored in the cloud-of-clouds, while file data can be kept at different locations. When a file is created, users can specify if it will be maintained locally, in a single cloud provider or in multiple cloud providers.

More information about the Overbank and Charon can be found in BiobankCloud deliverables 4.1 and 4.2.