NeIC Sensitive Data Archive
The NeIC Sensitive Data Archive (SDA) is an encrypted data archive, implemented for storage of sensitive data. It is implemented as a modular microservice system that can be deployed in different configurations depending on the service needs.
The modular architecture of SDA supports both stand alone deployment of an archive, and the use case of deploying a Federated node in the Federated European Genome-phenome Archive network (FEGA), serving discoverable sensitive datasets in the main EGA web portal.
Note
Throughout this documentation, reference to Central
EGA may be made as CEGA
or CentralEGA
, and any
FederatedEGA
instance is alternatively known as FEGA
, LEGA
, or LocalEGA
.
Within the context of NeIC, the Federated EGA is denoted as
the Sensitive Data Archive
or SDA
.
Organisation of the NeIC SDA Operations Handbook
This operations handbook is organized in four main parts, that each has it's own main section in the left navigation menu. Here is a condensed summary, follow the links below or use the menu navigation to each section's own detailed introduction page:
-
Structure: Provides overview material for how the services can be deployed in different constellations and highlights communication paths.
-
Communication: Provides more detailed documentation focused on inter-service communication, as OpenAPI-specs for APIs, RabbitMQ message flow, and database information flow details.
-
Services: Per service detailed specifications and documentation.
-
Guides: Topic-guides for topics like "Deployment", "Federated vs. Stand-alone", "Troubleshooting services", etc.
SDA Components and Architecture
The main components and the interaction between them, based on the NeIC Sensitive Data Archive deployment in a FederatedEGA
setup, are illustrated in the figure below. The different colored backgrounds represent different zones of separation in the federated deployment.
The components illustrated can be classified by which archive sub-process they take part in:
Submission
- the process of submitting sensitive data and meta-data to the inbox staging areaIngestion
- the process of verifying uploaded data and securely storing it in archive storage, while synchronizing state and identifier information with CEGAData Retrieval
- the process of re-encrypting and staging data for retrieval/download.
Service/component | Description | Archive sub-process |
---|---|---|
Database | A Postgres database with appropriate schema, stores the file header, the accession id, file path and checksums as well as other relevant information. | Submission, Ingestion and Data Retrieval |
MQ | A RabbitMQ message broker with appropriate accounts, exchanges, queues and bindings. We use a federated queue to get messages from CentralEGA's broker and shovels to send answers back. | Submission and Ingestion |
Inbox | Upload service for incoming data, acting as a dropbox. Uses credentials from CentralEGA . |
Submission |
Intercept | Relays messages between the queue provided from the federated service and local queues. | Submission and Ingestion |
Ingest | Splits the Crypt4GH header and moves it to the database. The remainder of the file is sent to the storage backend (archive). No cryptographic tasks are done. | Ingestion |
Verify | Using the archive crypt4gh secret key, this service can decrypt the stored files and checksum them against the embedded checksum for the unencrypted file. | Ingestion |
Finalize | Handles the so-called Accession ID (stable ID) to filename mappings from CentralEGA. | Ingestion |
Mapper | The mapper service register mapping of accessionIDs (stable ids for files) to datasetIDs. | Ingestion |
Archive (Storage) | Storage backend: can be a regular (POSIX) file system or a S3 object store. | Ingestion and Data Retrieval |
Data Retrieval API | Provides a download/data access API for streaming archived data either in encrypted or decrypted format. | Data Retrieval |
Inbox (Storage) | Storage backend: can be a regular (POSIX) file system or a S3 object store. | Ingestion |
Backup (Storage) | Storage backend: can be a regular (POSIX) file system or a S3 object store. | Ingestion |