Deploying on Kubernetes
Guide summary
This guide explains how to deploy the Sensitive Data Archive (SDA) in kubernetes.
- What it intends to cover
- What to expect, scope, explain level of details
- How self-contained the guide is
- Examples expected to work directly or not, must be configured (example configurations, most updated version?)
Local security / zone considerations
Differences in deployment make concrete examples challenging, here it is explained only the guidelines.
For secure deployment of the system one can think it by what can be accessed from where, for all ways of deploying two trust boundaries can be used, external and internal. For an extra layer of security also the storage trust boundary can be separate. The service is provided for customers on the internet therefore an example of deploying the service is using two separate Kubernetes clusters, one for responding to customers and other communication from outside, and another, more secure, storage facing internal cluster.
One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data Retrieval API is used to serve unencrypted files the recommendation is to have it available only in an internal cluster.
The services could be divided into two trust boundaries - The services in external cluster are Inbox and MQ - The services in internal cluster are Intercept, Ingest, Verify, Mapper, Finalize, Backup and Data Retrieval API.
The innermost trust zone contains the database and the archive, which be can accessed only from internal cluster.
Charts overview
The neicnordic Helm repository contains the following charts (for configuration details click on the links below):
This chart deploys the service components needed to operate the Sensitive Data Archive (SDA) solution. It may include additional service components that might be beneficial for administrative operations or extending the SDA solution to facilitate different use cases.
This chart deploys a pre-configured database (PostgreSQL) instance for Sensitive Data Archive, the database schemas are designed to adhere to European Genome-Phenome Archive federated archiving model.
This chart deploys a pre-configured message broker (RabbitMQ) designed for European Genome-Phenome Archive federated messaging between CentralEGA and Local/Federated EGAs but also configurable to support Standalone SDA deployments.
This chart deploys an orchestration service for the Sensitive Data Archive solution. This is a helper service designed to curate the ingestion flow in an automated manner when the SDA solution is deployed and configured as standalone (non-federated).
Note: The sda-orch chart may be out of date and is thus not guaranteed to be functional.
Usage
Helm must be installed to use the charts. Please refer to Helm's documentation to get started.
With Helm properly installed, add the neicnordic Helm repository as follows:
helm repo add neicnordic https://neicnordic.github.io/sensitive-data-archive
helm repo update
You can then run
helm search repo neicnordic
to see the available charts.
Installing the Charts
To install a chart with the release name my-release:
helm install my-release neicnordic/<chart-name>
To configure a Helm chart with your own values, you can copy the default values.yaml file from the chart to your local directory and modify it as needed, or using helm:
helm show values neicnordic/<chart-name> > <values-filename>.yaml
Note that Kubernetes resources, such as secrets, may be required for a chart to function properly. All necessary resources should be created in the Kubernetes cluster before installing the chart.
Then, you can install the chart with the following command:
helm install my-release -f <values-filename>.yaml neicnordic/<chart-name>
Example:
First create the secret containing the crypt4gh keypair and passphrase before the chart is deployed (see e.g. here). Then edit the values.yaml to your liking and install the chart:
helm show values neicnordic/sda-svc > my-values.yaml
vi my-values.yaml # modify with your own settings
helm install my-release neicnordic/sda-svc -f my-values.yaml
For quick reference to Helm's chart management capabilities see here.
Uninstalling the Chart
To uninstall the my-release deployment:
helm delete my-release
The command removes all the Kubernetes components associated with the chart and deletes the release.
System requirements
- kubernetes minimal version required for running the helm charts is
>= 1.25 - helm minimal version required for running the charts is
>=3.5
Resource estimation
- RabbitMQ - official recommended resource requirements for a RabbitMQ cluster
- PostgreSQL - official recommended resource requirements for PostgreSQL
Minimal working configuration
The table below reflects the minimum required resources to run the services in the helm charts.
| Service | CPU | Memory | Disk |
|---|---|---|---|
| RabbitMQ | 1000m | 1Gi | 8Gi |
| PostgreSQL | 100m | 128Mi | 8Gi |
| intercept | 100m | 32Mi | - |
| ingest | 100m | 128Mi | - |
| verify | 100m | 128Mi | - |
| finalize | 100m | 128Mi | - |
| download | 100m | 256Mi | - |
| auth | 100m | 128Mi | - |
| s3inbox | 100m | 128Mi | - |
| sftpinbox | 100m | 128Mi | - |
| doa | 100m | 128Mi | - |
Chart configuration
Here we provide minimal lists of variables that require configuration in addition to the defaults, so as to achieve a working deployment of the sensitive data archive. These variables can be set in the respective values.yml file for each of the Helm charts:
In what follows it is assumed that a federated setup is being deployed.
SDA services chart
Below is a minimal list of variables that need to be configured in the values.yml file of the Helm chart for the sensitive data archive services in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.
Global Variables
TLS support
global.tls.issuerorglobal.tls.clusterIssuer: The issuer or cluster issuer for TLS
Storage
global.archive.storageType: The storage type for the archiveglobal.backupArchive.storageType: The storage type for the backup archive.global.inbox.storageType: The storage type for the inbox.
If, for example the above are set to s3, then the following variables need to be set as well:
global.archive.s3Url: The S3 URL for the archiveglobal.archive.s3Bucket: The S3 bucket for the archiveglobal.archive.s3AccessKey: The S3 access key for the archive-
global.archive.s3SecretKey: The S3 secret key for the archive -
global.backupArchive.s3Url: The S3 URL for the backup archive global.backupArchive.s3Bucket: The S3 bucket for the backup archiveglobal.backupArchive.s3AccessKey: The S3 access key for the backup archive-
global.backupArchive.s3SecretKey: The S3 secret key for the backup archive -
global.inbox.s3Url: The S3 URL for the inbox global.inbox.s3Bucket: The S3 bucket for the inboxglobal.inbox.s3AccessKey: The S3 access key for the inboxglobal.inbox.s3SecretKey: The S3 secret key for the inbox
RabbitMQ
global.broker.host: The host for the brokerglobal.broker.exchange: The exchange for the brokerglobal.broker.routingError: The routing error for the brokerglobal.broker.backupRoutingKey: The backup routing key for the broker
Crypt4gh
global.c4gh.secretName: The name by which the kubernetes secret for c4gh is referenced in the Helm chartsglobal.c4gh.keyFile: The crypt4gh private key fileglobal.c4gh.passphrase: The passphrase for c4gh
CEGA
global.cega.host: The host for Federated EGA NSS APIglobal.cega.user: The user for accessing Federated EGA NSS APIglobal.cega.password: The password for Federated EGA NSS API
Database
global.db.host: The host for the database
Service Specific Credentials
Intercept
credentials.intercept.mqUser: The message queue user for interceptcredentials.intercept.mqPassword: The message queue password for intercept
Ingest
credentials.ingest.dbUser: The database user for ingestcredentials.ingest.dbPassword: The database password for ingestcredentials.ingest.mqUser: The message queue user for ingestcredentials.ingest.mqPassword: The message queue password for ingest
Verify
credentials.verify.dbUser: The database user for verifycredentials.verify.dbPassword: The database password for verifycredentials.verify.mqUser: The message queue user for verifycredentials.verify.mqPassword: The message queue password for verify
Finalize
credentials.finalize.dbUser: The database user for finalizecredentials.finalize.dbPassword: The database password for finalizecredentials.finalize.mqUser: The message queue user for finalizecredentials.finalize.mqPassword: The message queue password for finalize
To enable Backup functionality:
credentials.backup.dbUser: The database user for backupcredentials.backup.dbPassword: The database password for backupcredentials.backup.mqUser: The message queue user for backupcredentials.backup.mqPassword: The message queue password for backup
Mapper
credentials.mapper.dbUser: The database user for mappercredentials.mapper.dbPassword: The database password for mappercredentials.mapper.mqUser: The message queue user for mappercredentials.mapper.mqPassword: The message queue password for mapper
Minimal configuration for additional services
Ingress
global.ingress.deploy: Determines if the ingress should be deployedglobal.ingress.clusterIssuer: The cluster issuer for the ingressglobal.ingress.hostName.auth: The hostname for the authglobal.ingress.hostName.download: The hostname for the downloadglobal.ingress.hostName.s3Inbox: The hostname for the S3 Inbox
SDA-auth
global.auth.jwtSecret: The JWT secret for authglobal.auth.jwtKey: The JWT key for authglobal.auth.jwtPub: The JWT public key for auth
SDA-download
global.download.enabled: Determines if the download is enabledcredentials.download.dbUser: The database user for downloadcredentials.download.dbPassword: The database password for download
LS-AAI OIDC
global.oidc.id: The ID for OIDCglobal.oidc.secret: The secret for OIDC
S3Inbox
credentials.inbox.dbUser: The database user for inboxcredentials.inbox.dbPassword: The database password for inboxcredentials.inbox.mqUser: The message queue user for inboxcredentials.inbox.mqPassword: The message queue password for inbox
RabbitMQ chart
Below is a minimal list of variables that need to be configured in the values.yml file of the RabbitMQ Helm chart in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.
global.adminUser: The username for the admin userglobal.adminPassword: The password for the admin userglobal.shovel.host: The hostname of the shovel serverglobal.shovel.pass: The password to authenticate with the shovel serverglobal.shovel.port: The port on which the shovel server is runningglobal.shovel.user: The username to authenticate with the shovel serverglobal.shovel.vhost: The virtual host on the shovel server
Database chart
Below is a minimal list of variables that need to be configured in the values.yml file of the SDA Database Helm chart in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.
global.postgresAdminPassword: The password for the postgres admin userglobal.tls.clusterIssuer: The cluster issuer for TLSglobal.tls.secretName: The name by which the kubernetes secret for TLS is referenced in the Helm charts
Network policies
- DNS names and ingress for services
When deploying applications on Kubernetes, it is essential to understand the DNS naming conventions and ingress configurations for Pods and Services. Each Pod within the cluster is assigned a DNS name in the format of pod-ip-address.<cluster>.pod.cluster.local. This DNS resolution allows seamless communication between Pods within the same cluster.
Services, representing sets of Pods, are assigned A DNS records with names structured as <service_name>.<namespace>.svc.cluster.local. This DNS record resolves to the cluster IP of the respective Service.
| Service Name | Common DNS Name |
|---|---|
| inbox | sda-svc-inbox. |
| download | sda-svc-download. |
| auth | sda-svc-auth. |
| mq | broker-sda-mq. |
Certain services, such as inbox, download, and auth, are configured to expect an ingress. Ingress provides external access to these services, allowing external clients to communicate with them. The following services specifically expect an ingress:
- inbox
- download
- auth
In addition, Kubernetes allows to define Network Policies to control the communication between Pods. Network Policies are crucial for enforcing security measures within the cluster. These facilitate the specification of which Pods can communicate with each other and define rules for ingress and egress traffic.
Here are two recommended basic examples of a Network Policy for namespace isolation and allowing traffic to inbox ingress, a similar policies needs to be in place for download and auth service:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: namespace-isolation
spec:
podSelector: {}
policyTypes:
- Egress
- Ingress
egress:
- to:
- podSelector: {}
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
ingress:
- from:
- podSelector: {}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: inbox-ingress-with-ingress-controller
spec:
podSelector:
matchLabels:
app: sda-svc-inbox
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/component: controller
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
- from:
- podSelector:
matchLabels:
app.kubernetes.io/component: controller
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx-direct
policyTypes:
- Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: inbox-ingress-with-nodeport
spec:
podSelector:
matchLabels:
app: sda-svc-inbox
ingress:
- from:
- ipBlock:
cidr: 0.0.0.0/0
policyTypes:
- Ingress
Complementary services
- sda-auth, sda-doa, sda-download