mapper Service
The mapper service registers mapping of accessionIDs (stable ids for files) to datasetIDs. Once the file accession ID has been mapped to a dataset ID, the file is removed from the inbox.
Service Description
The mapper service maps file accessionIDs to datasetIDs.
When running, mapper reads messages from the configured RabbitMQ queue (commonly: mappings).
For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):
- The message is validated as valid JSON that matches the
dataset-mappingschema.- If the message can’t be validated it is discarded with an error message is logged.
- AccessionIDs from the message are mapped to a datasetID (also in the message) in the database.
- On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
- The uploaded files related to each AccessionID is removed from the inbox
- If this fails an error will be written to the logs.
- The RabbitMQ message is Ack'ed.
Communication
Mapperreads messages from one RabbitMQ queue (commonly:mappings).Mappermaps files to datasets in the database using theMapFilesToDatasetfunction.Mapperretrieves the inbox filepath from the database for each file using theGetInboxPathfunction.Mappersets the status of a dataset in the database using theUpdateDatasetEventfunction.Mapperremoves data from inbox storage.
Configuration
There are a number of options that can be set for the mapper service.
These settings can be set by mounting a yaml-file at /config.yaml with settings.
ex.
log:
level: "debug"
format: "json"
They may also be set using environment variables like:
export LOG_LEVEL="debug"
export LOG_FORMAT="json"
RabbitMQ broker settings
These settings control how mapper connects to the RabbitMQ message broker.
BROKER_HOST: hostname of the RabbitMQ serverBROKER_PORT: RabbitMQ broker port (commonly:5671with TLS and5672without)BROKER_QUEUE: message queue to read messages from (commonly:mappings)BROKER_USER: username to connect to RabbitMQBROKER_PASSWORD: password to connect to RabbitMQBROKER_PREFETCHCOUNT: Number of messages to pull from the message server at the time (default to2)
PostgreSQL Database settings
DB_HOST: hostname for the postgresql databaseDB_PORT: database port (commonly:5432)DB_USER: username for the databaseDB_PASSWORD: password for the databaseDB_DATABASE: database nameDB_SSLMODE: The TLS encryption policy to use for database connections, valid options are:disableallowpreferrequireverify-caverify-full
More information is available in the postgresql documentation
Note that if DB_SSLMODE is set to anything but disable, then DB_CACERT needs to be set, and if set to verify-full, then DB_CLIENTCERT, and DB_CLIENTKEY must also be set.
DB_CLIENTKEY: key-file for the database client certificateDB_CLIENTCERT: database client certificate fileDB_CACERT: Certificate Authority (CA) certificate for the database to use
Storage settings
The mapper service requires access to the "inbox" storage. To configure that, the following configuration is required:
storage:
inbox:
${STORAGE_IMPLEMENTATION}:
For more details on available configuration see storage/v2 README.md
Logging settings
LOG_FORMATcan be set tojsonto get logs in JSON format. All other values result in text logging.LOG_LEVELcan be set to one of the following, in increasing order of severity:tracedebuginfowarn(orwarning)errorfatalpanic