Deploying on Kubernetes

Guide summary

This guide explains how to deploy the Sensitive Data Archive (SDA) in kubernetes.

  • What it intends to cover
  • What to expect, scope, explain level of details
  • How self-contained the guide is
  • Examples expected to work directly or not, must be configured (example configurations, most updated version?)

Local security / zone considerations

Differences in deployment make concrete examples challenging, here it is explained only the guidelines.

For secure deployment of the system one can think it by what can be accessed from where, for all ways of deploying two trust boundaries can be used, external and internal. For an extra layer of security also the storage trust boundary can be separate. The service is provided for customers on the internet therefore an example of deploying the service is using two separate Kubernetes clusters, one for responding to customers and other communication from outside, and another, more secure, storage facing internal cluster.

One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data Retrieval API is used to serve unencrypted files the recommendation is to have it available only in an internal cluster.

The services could be divided into two trust boundaries - The services in external cluster are Inbox and MQ - The services in internal cluster are Intercept, Ingest, Verify, Mapper, Finalize, Backup and Data Retrieval API.

The innermost trust zone contains the database and the archive, which be can accessed only from internal cluster.

Charts overview

The neicnordic Helm repository contains the following charts (for configuration details click on the links below):

This chart deploys the service components needed to operate the Sensitive Data Archive (SDA) solution. It may include additional service components that might be beneficial for administrative operations or extending the SDA solution to facilitate different use cases.

This chart deploys a pre-configured database (PostgreSQL) instance for Sensitive Data Archive, the database schemas are designed to adhere to European Genome-Phenome Archive federated archiving model.

This chart deploys a pre-configured message broker (RabbitMQ) designed for European Genome-Phenome Archive federated messaging between CentralEGA and Local/Federated EGAs but also configurable to support Standalone SDA deployments.

This chart deploys an orchestration service for the Sensitive Data Archive solution. This is a helper service designed to curate the ingestion flow in an automated manner when the SDA solution is deployed and configured as standalone (non-federated). Note: The sda-orch chart may be out of date and is thus not guaranteed to be functional.

Usage

Helm must be installed to use the charts. Please refer to Helm's documentation to get started.

With Helm properly installed, add the neicnordic Helm repository as follows:

helm repo add neicnordic https://neicnordic.github.io/sensitive-data-archive
helm repo update

You can then run

helm search repo neicnordic

to see the available charts.

Installing the Charts

To install a chart with the release name my-release:

helm install my-release neicnordic/<chart-name>

To configure a Helm chart with your own values, you can copy the default values.yaml file from the chart to your local directory and modify it as needed, or using helm:

helm show values neicnordic/<chart-name> > <values-filename>.yaml

Note that Kubernetes resources, such as secrets, may be required for a chart to function properly. All necessary resources should be created in the Kubernetes cluster before installing the chart.

Then, you can install the chart with the following command:

helm install my-release -f <values-filename>.yaml neicnordic/<chart-name>

Example:

First create the secret containing the crypt4gh keypair and passphrase before the chart is deployed (see e.g. here). Then edit the values.yaml to your liking and install the chart:

helm show values neicnordic/sda-svc > my-values.yaml
vi my-values.yaml # modify with your own settings
helm install my-release neicnordic/sda-svc -f my-values.yaml

For quick reference to Helm's chart management capabilities see here.

Uninstalling the Chart

To uninstall the my-release deployment:

helm delete my-release

The command removes all the Kubernetes components associated with the chart and deletes the release.

System requirements

  • kubernetes minimal version required for running the helm charts is >= 1.25
  • helm minimal version required for running the charts is >=3.5

Resource estimation

  • RabbitMQ - official recommended resource requirements for a RabbitMQ cluster
  • PostgreSQL - official recommended resource requirements for PostgreSQL

Minimal working configuration

The table below reflects the minimum required resources to run the services in the helm charts.

Service CPU Memory Disk
RabbitMQ 1000m 1Gi 8Gi
PostgreSQL 100m 128Mi 8Gi
intercept 100m 32Mi -
ingest 100m 128Mi -
verify 100m 128Mi -
finalize 100m 128Mi -
download 100m 256Mi -
auth 100m 128Mi -
s3inbox 100m 128Mi -
sftpinbox 100m 128Mi -
doa 100m 128Mi -

Chart configuration

Here we provide minimal lists of variables that require configuration in addition to the defaults, so as to achieve a working deployment of the sensitive data archive. These variables can be set in the respective values.yml file for each of the Helm charts:

In what follows it is assumed that a federated setup is being deployed.

SDA services chart

Below is a minimal list of variables that need to be configured in the values.yml file of the Helm chart for the sensitive data archive services in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.

Global Variables

TLS support
  • global.tls.issuer or global.tls.clusterIssuer: The issuer or cluster issuer for TLS
Storage
  • global.archive.storageType: The storage type for the archive
  • global.backupArchive.storageType: The storage type for the backup archive.
  • global.inbox.storageType: The storage type for the inbox.

If, for example the above are set to s3, then the following variables need to be set as well:

  • global.archive.s3Url: The S3 URL for the archive
  • global.archive.s3Bucket: The S3 bucket for the archive
  • global.archive.s3AccessKey: The S3 access key for the archive
  • global.archive.s3SecretKey: The S3 secret key for the archive

  • global.backupArchive.s3Url: The S3 URL for the backup archive

  • global.backupArchive.s3Bucket: The S3 bucket for the backup archive
  • global.backupArchive.s3AccessKey: The S3 access key for the backup archive
  • global.backupArchive.s3SecretKey: The S3 secret key for the backup archive

  • global.inbox.s3Url: The S3 URL for the inbox

  • global.inbox.s3Bucket: The S3 bucket for the inbox
  • global.inbox.s3AccessKey: The S3 access key for the inbox
  • global.inbox.s3SecretKey: The S3 secret key for the inbox
RabbitMQ
  • global.broker.host: The host for the broker
  • global.broker.exchange: The exchange for the broker
  • global.broker.routingError: The routing error for the broker
  • global.broker.backupRoutingKey: The backup routing key for the broker
Crypt4gh
  • global.c4gh.secretName: The name by which the kubernetes secret for c4gh is referenced in the Helm charts
  • global.c4gh.keyFile: The crypt4gh private key file
  • global.c4gh.passphrase: The passphrase for c4gh
CEGA
  • global.cega.host: The host for Federated EGA NSS API
  • global.cega.user: The user for accessing Federated EGA NSS API
  • global.cega.password: The password for Federated EGA NSS API
Database
  • global.db.host: The host for the database

Service Specific Credentials

Intercept
  • credentials.intercept.mqUser: The message queue user for intercept
  • credentials.intercept.mqPassword: The message queue password for intercept
Ingest
  • credentials.ingest.dbUser: The database user for ingest
  • credentials.ingest.dbPassword: The database password for ingest
  • credentials.ingest.mqUser: The message queue user for ingest
  • credentials.ingest.mqPassword: The message queue password for ingest
Verify
  • credentials.verify.dbUser: The database user for verify
  • credentials.verify.dbPassword: The database password for verify
  • credentials.verify.mqUser: The message queue user for verify
  • credentials.verify.mqPassword: The message queue password for verify
Finalize
  • credentials.finalize.dbUser: The database user for finalize
  • credentials.finalize.dbPassword: The database password for finalize
  • credentials.finalize.mqUser: The message queue user for finalize
  • credentials.finalize.mqPassword: The message queue password for finalize

To enable Backup functionality:

  • credentials.backup.dbUser: The database user for backup
  • credentials.backup.dbPassword: The database password for backup
  • credentials.backup.mqUser: The message queue user for backup
  • credentials.backup.mqPassword: The message queue password for backup
Mapper
  • credentials.mapper.dbUser: The database user for mapper
  • credentials.mapper.dbPassword: The database password for mapper
  • credentials.mapper.mqUser: The message queue user for mapper
  • credentials.mapper.mqPassword: The message queue password for mapper

Minimal configuration for additional services

Ingress
  • global.ingress.deploy: Determines if the ingress should be deployed
  • global.ingress.clusterIssuer: The cluster issuer for the ingress
  • global.ingress.hostName.auth: The hostname for the auth
  • global.ingress.hostName.download: The hostname for the download
  • global.ingress.hostName.s3Inbox: The hostname for the S3 Inbox
SDA-auth
  • global.auth.jwtSecret: The JWT secret for auth
  • global.auth.jwtKey: The JWT key for auth
  • global.auth.jwtPub: The JWT public key for auth
SDA-download
  • global.download.enabled: Determines if the download is enabled
  • credentials.download.dbUser: The database user for download
  • credentials.download.dbPassword: The database password for download
LS-AAI OIDC
  • global.oidc.id: The ID for OIDC
  • global.oidc.secret: The secret for OIDC
S3Inbox
  • credentials.inbox.dbUser: The database user for inbox
  • credentials.inbox.dbPassword: The database password for inbox
  • credentials.inbox.mqUser: The message queue user for inbox
  • credentials.inbox.mqPassword: The message queue password for inbox

RabbitMQ chart

Below is a minimal list of variables that need to be configured in the values.yml file of the RabbitMQ Helm chart in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.

  • global.adminUser: The username for the admin user
  • global.adminPassword: The password for the admin user
  • global.shovel.host: The hostname of the shovel server
  • global.shovel.pass: The password to authenticate with the shovel server
  • global.shovel.port: The port on which the shovel server is running
  • global.shovel.user: The username to authenticate with the shovel server
  • global.shovel.vhost: The virtual host on the shovel server

Database chart

Below is a minimal list of variables that need to be configured in the values.yml file of the SDA Database Helm chart in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.

  • global.postgresAdminPassword: The password for the postgres admin user
  • global.tls.clusterIssuer: The cluster issuer for TLS
  • global.tls.secretName: The name by which the kubernetes secret for TLS is referenced in the Helm charts

Network policies

  • DNS names and ingress for services

When deploying applications on Kubernetes, it is essential to understand the DNS naming conventions and ingress configurations for Pods and Services. Each Pod within the cluster is assigned a DNS name in the format of pod-ip-address.<cluster>.pod.cluster.local. This DNS resolution allows seamless communication between Pods within the same cluster.

Services, representing sets of Pods, are assigned A DNS records with names structured as <service_name>.<namespace>.svc.cluster.local. This DNS record resolves to the cluster IP of the respective Service.

Service Name Common DNS Name
inbox sda-svc-inbox..svc.cluster.local
download sda-svc-download..svc.cluster.local
auth sda-svc-auth..svc.cluster.local
mq broker-sda-mq..svc.cluster.local

Certain services, such as inbox, download, and auth, are configured to expect an ingress. Ingress provides external access to these services, allowing external clients to communicate with them. The following services specifically expect an ingress:

  • inbox
  • download
  • auth

In addition, Kubernetes allows to define Network Policies to control the communication between Pods. Network Policies are crucial for enforcing security measures within the cluster. These facilitate the specification of which Pods can communicate with each other and define rules for ingress and egress traffic. Here are two recommended basic examples of a Network Policy for namespace isolation and allowing traffic to inbox ingress, a similar policies needs to be in place for download and auth service:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-isolation
spec:
  podSelector: {}
  policyTypes:
  - Egress
  - Ingress
  egress:
  - to:
    - podSelector: {}
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
  ingress:
    - from:
      - podSelector: {}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: inbox-ingress-with-ingress-controller
spec:
  podSelector: 
      matchLabels:
      app: sda-svc-inbox
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/component: controller
      namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/component: controller
      namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx-direct
  policyTypes:
  - Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: inbox-ingress-with-nodeport
spec:
    podSelector: 
        matchLabels:
        app: sda-svc-inbox
    ingress:
    - from:
        - ipBlock:
            cidr: 0.0.0.0/0
    policyTypes:
    - Ingress

Complementary services