Deploying on Kubernetes
Guide summary
This guide explains how to deploy the Sensitive Data Archive (SDA) in kubernetes.
- What it intends to cover
- What to expect, scope, explain level of details
- How self-contained the guide is
- Examples expected to work directly or not, must be configured (example configurations, most updated version?)
Local security / zone considerations
Differences in deployment make concrete examples challenging, here it is explained only the guidelines.
For secure deployment of the system one can think it by what can be accessed from where, for all ways of deploying two trust boundaries can be used, external and internal. For an extra layer of security also the storage trust boundary can be separate. The service is provided for customers on the internet therefore an example of deploying the service is using two separate Kubernetes clusters, one for responding to customers and other communication from outside, and another, more secure, storage facing internal cluster.
One thing to consider is where to release the data, that could be closed protected environment with tightly restricted access. If Data Retrieval API is used to serve unencrypted files the recommendation is to have it available only in an internal cluster.
The services could be divided into two trust boundaries - The services in external cluster are Inbox and MQ - The services in internal cluster are Intercept, Ingest, Verify, Mapper, Finalize, Backup and Data Retrieval API.
The innermost trust zone contains the database and the archive, which be can accessed only from internal cluster.
Charts overview
sda-db - Database component for Sensitive Data Archive (SDA) installation
This chart deploys a pre-configured database (PostgreSQL) instance for Sensitive Data Archive, the database schemas are designed to adhere to European Genome-Phenome Archive federated archiving model.
sda-mq - RabbitMQ component for Sensitive Data Archive (SDA) installation
This chart deploys a pre-configured message broker (RabbitMQ) designed to work European Genome-Phenome Archive federated messaging interface between CentralEGA
and Local/Federated EGAs.
sda-svc - Components for Sensitive Data Archive (SDA) installation
This chart deploys the service components needed to operate the Sensitive Data Archive solution for running a Federated EGA node. The charts may include additional service components that might be beneficial for administrative operations or extending the Sensitive Data Archive solutions to facilitate other use cases.
System requirements
- kubernetes minimal version required for running the helm charts is
>= 1.25
- helm minimal version required for running the charts is
>=3.5
Resource estimation
- RabbitMQ - official recommended resource requirements for a RabbitMQ cluster
- PostgreSQL - official recommended resource requirements for PostgreSQL
Minimal working configuration
The table below reflects the minimum required resources to run the services in the helm charts.
Service | CPU | Memory | Disk |
---|---|---|---|
RabbitMQ | 1000m | 1Gi | 8Gi |
PostgreSQL | 100m | 128Mi | 8Gi |
intercept | 100m | 32Mi | - |
ingest | 100m | 128Mi | - |
verify | 100m | 128Mi | - |
finalize | 100m | 128Mi | - |
download | 100m | 256Mi | - |
auth | 100m | 128Mi | - |
s3inbox | 100m | 128Mi | - |
sftpinbox | 100m | 128Mi | - |
doa | 100m | 128Mi | - |
Here, minimal lists of variables requiring configuration, in addition to the defaults, are provided in the respective values.yml
file for each of the Helm charts.
in order to achieve a working deployment of the sensitive data archive
. In the following it is assumed that a federated setup is being deployed.
SDA services chart
Below is a minimal list of variables that need to be configured in the values.yml file of the Helm chart for the sensitive data archive
services in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.
Global Variables
TLS support
global.tls.issuer
orglobal.tls.clusterIssuer
: The issuer or cluster issuer for TLS
Storage
global.archive.storageType
: The storage type for the archiveglobal.backupArchive.storageType
: The storage type for the backup archive.global.inbox.storageType
: The storage type for the inbox.
If, for example the above are set to s3
, then the following variables need to be set as well:
global.archive.s3Url
: The S3 URL for the archiveglobal.archive.s3Bucket
: The S3 bucket for the archiveglobal.archive.s3AccessKey
: The S3 access key for the archive-
global.archive.s3SecretKey
: The S3 secret key for the archive -
global.backupArchive.s3Url
: The S3 URL for the backup archive global.backupArchive.s3Bucket
: The S3 bucket for the backup archiveglobal.backupArchive.s3AccessKey
: The S3 access key for the backup archive-
global.backupArchive.s3SecretKey
: The S3 secret key for the backup archive -
global.inbox.s3Url
: The S3 URL for the inbox global.inbox.s3Bucket
: The S3 bucket for the inboxglobal.inbox.s3AccessKey
: The S3 access key for the inboxglobal.inbox.s3SecretKey
: The S3 secret key for the inbox
RabbitMQ
global.broker.host
: The host for the brokerglobal.broker.exchange
: The exchange for the brokerglobal.broker.routingError
: The routing error for the brokerglobal.broker.backupRoutingKey
: The backup routing key for the broker
Crypt4gh
global.c4gh.secretName
: The name by which the kubernetes secret for c4gh is referenced in the Helm chartsglobal.c4gh.keyFile
: The crypt4gh private key fileglobal.c4gh.passphrase
: The passphrase for c4gh
CEGA
global.cega.host
: The host for Federated EGA NSS APIglobal.cega.user
: The user for accessing Federated EGA NSS APIglobal.cega.password
: The password for Federated EGA NSS API
Database
global.db.host
: The host for the database
Service Specific Credentials
Intercept
credentials.intercept.mqUser
: The message queue user for interceptcredentials.intercept.mqPassword
: The message queue password for intercept
Ingest
credentials.ingest.dbUser
: The database user for ingestcredentials.ingest.dbPassword
: The database password for ingestcredentials.ingest.mqUser
: The message queue user for ingestcredentials.ingest.mqPassword
: The message queue password for ingest
Verify
credentials.verify.dbUser
: The database user for verifycredentials.verify.dbPassword
: The database password for verifycredentials.verify.mqUser
: The message queue user for verifycredentials.verify.mqPassword
: The message queue password for verify
Finalize
credentials.finalize.dbUser
: The database user for finalizecredentials.finalize.dbPassword
: The database password for finalizecredentials.finalize.mqUser
: The message queue user for finalizecredentials.finalize.mqPassword
: The message queue password for finalize
To enable Backup functionality:
credentials.backup.dbUser
: The database user for backupcredentials.backup.dbPassword
: The database password for backupcredentials.backup.mqUser
: The message queue user for backupcredentials.backup.mqPassword
: The message queue password for backup
Mapper
credentials.mapper.dbUser
: The database user for mappercredentials.mapper.dbPassword
: The database password for mappercredentials.mapper.mqUser
: The message queue user for mappercredentials.mapper.mqPassword
: The message queue password for mapper
Minimal configuration for additional services
Ingress
global.ingress.deploy
: Determines if the ingress should be deployedglobal.ingress.clusterIssuer
: The cluster issuer for the ingressglobal.ingress.hostName.auth
: The hostname for the authglobal.ingress.hostName.download
: The hostname for the downloadglobal.ingress.hostName.s3Inbox
: The hostname for the S3 Inbox
SDA-auth
global.auth.jwtSecret
: The JWT secret for authglobal.auth.jwtKey
: The JWT key for authglobal.auth.jwtPub
: The JWT public key for auth
SDA-download
global.download.enabled
: Determines if the download is enabledcredentials.download.dbUser
: The database user for downloadcredentials.download.dbPassword
: The database password for download
LS-AAI OIDC
global.oidc.id
: The ID for OIDCglobal.oidc.secret
: The secret for OIDC
S3Inbox
credentials.inbox.dbUser
: The database user for inboxcredentials.inbox.dbPassword
: The database password for inboxcredentials.inbox.mqUser
: The message queue user for inboxcredentials.inbox.mqPassword
: The message queue password for inbox
RabbitMQ chart
Below is a minimal list of variables that need to be configured in the values.yml file of the RabbitMQ
Helm chart in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.
global.adminUser
: The username for the admin userglobal.adminPassword
: The password for the admin userglobal.shovel.host
: The hostname of the shovel serverglobal.shovel.pass
: The password to authenticate with the shovel serverglobal.shovel.port
: The port on which the shovel server is runningglobal.shovel.user
: The username to authenticate with the shovel serverglobal.shovel.vhost
: The virtual host on the shovel server
Database chart
Below is a minimal list of variables that need to be configured in the values.yml file of the SDA Database
Helm chart in order to achieve a working deployment. Detailed documentation on all of the chart's variables can be found here.
global.postgresAdminPassword
: The password for the postgres admin userglobal.tls.clusterIssuer
: The cluster issuer for TLSglobal.tls.secretName
: The name by which the kubernetes secret for TLS is referenced in the Helm charts
Network policies
- DNS names and ingress for services
When deploying applications on Kubernetes, it is essential to understand the DNS naming conventions and ingress configurations for Pods and Services. Each Pod within the cluster is assigned a DNS name in the format of pod-ip-address.<cluster>.pod.cluster.local
. This DNS resolution allows seamless communication between Pods within the same cluster.
Services, representing sets of Pods, are assigned A DNS records with names structured as <service_name>.<namespace>.svc.cluster.local
. This DNS record resolves to the cluster IP of the respective Service.
Service Name | Common DNS Name |
---|---|
inbox | sda-svc-inbox. |
download | sda-svc-download. |
auth | sda-svc-auth. |
mq | broker-sda-mq. |
Certain services, such as inbox
, download
, and auth
, are configured to expect an ingress. Ingress provides external access to these services, allowing external clients to communicate with them. The following services specifically expect an ingress:
- inbox
- download
- auth
In addition, Kubernetes allows to define Network Policies to control the communication between Pods. Network Policies are crucial for enforcing security measures within the cluster. These facilitate the specification of which Pods can communicate with each other and define rules for ingress and egress traffic.
Here are two recommended basic examples of a Network Policy for namespace isolation and allowing traffic to inbox ingress, a similar policies needs to be in place for download
and auth
service:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: namespace-isolation
spec:
podSelector: {}
policyTypes:
- Egress
- Ingress
egress:
- to:
- podSelector: {}
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
ingress:
- from:
- podSelector: {}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: inbox-ingress-with-ingress-controller
spec:
podSelector:
matchLabels:
app: sda-svc-inbox
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/component: controller
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
- from:
- podSelector:
matchLabels:
app.kubernetes.io/component: controller
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx-direct
policyTypes:
- Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: inbox-ingress-with-nodeport
spec:
podSelector:
matchLabels:
app: sda-svc-inbox
ingress:
- from:
- ipBlock:
cidr: 0.0.0.0/0
policyTypes:
- Ingress
Complementary services
- sda-auth, sda-doa, sda-download