Troubleshooting
This guide aims to provide general tips on troubleshooting and restoring services to a functional state.
Note
Please also check questions other users have faced: sensitive-data-archive Discussion We encourage starting a discussion there as it will help us improve current guides.
After deployment checklist
After having deployed the SDA services in a FederatedEGA setup, the following steps can be followed to ensure that everything is up and running correctly.
Services running
The first step is to verify that the services are up and running and the credentials are valid. Make sure that:
- credentials for access to RabbitMQ and Postgres are securely injected to the respective services in the form of secrets
- all the pods/containers are in
Ready/Upstatus and and no restarts among the pods/containers.- for
FederatedEGAsetup the following pods are required:intercept,ingest,verify,finalize,mapperand a Data Retrieval API - check the pods/container logs contain as the last message (after they have started):
time="2023-12-12T19:25:02Z" level=info msg="Starting <service-name> service"forintercept,ingest,verify,finalize,mapper - check the pods/container logs contain as the last message (after they have started)
time="2023-12-12T19:28:16Z" level=info msg="(5/5) Starting web server"fordownloadData Retrieval service
- for
Next step is to make sure that the remote connections (CentralEGA RabbitMQ) are working. Login to the RabbitMQ admin page and check that:
- the Federation status of the Admin tab is in state
runningor usingrabbitmqctl federation_statusfrom the command line of a RabbitMQ pod/container. - the Shovel status of the Admin tab is in state
runningfor all shovels or usingrabbitmqctl shovel_statusfrom the command line of a RabbitMQ pod/container.
End-to-end testing
Note
This guide assumes that there exists a test instance account with CentralEGA. Make sure that the account is approved and added to the submitters group.
The local development and testing guide provides the scripts for testing different parts of the setup, that can be used
as a reference.
Upload file(s)
Upload one or a number of files of different sizes and check that,
- the file(s) exists in the configured
inboxof the storage backend (e.g. S3 bucket or POSIX path) - the file(s) entry exists in the database in the
sda.filesandsda.file_event_logtables - If the
s3inboxis used, there should be anuploadedevent for each specific file in thesda.file_event_log - the file(s) exists in the
CentralEGASubmission portal (the submission portal URL address is specific for eachFederatedEGAnode).Fileslisting, which can be accessed after pressing the three lines menu button.
Make a test submission
Make a submission with the portal and select the file(s) that were uploaded in the previous step. Once the analysis or runs (one of the two is required) step is finished, the messages for the ingestion of the files should appear in the logs of the ingest service. Make sure that:
- the messages are arriving for the file(s) included in the submission
- the
ingestion,verifyandfinalizeprocesses are started and send a message when finished - the data in
sda.filestable are correct - the files are logged in the
sda.file_event_logtable for each of the services and files - the file(s) exists in the configured
archivestorage backend, see thearchive_file_pathin thesda.filestable for the name of the archived file(s) - the archived file(s) exists in the configured
backupstorage backend - delete one run in the submitter portal, then and add it back again to make sure the cancel message is working as intended.
Finally, when all files have been ingested, the submission portal should allow for finalising the submission. The submission needs first to be accepted through a helpdesk portal. Once this step is done, make sure that,
- the message for the dataset arrives to the mapper service
- the dataset is created in the database and it includes the correct files by checking the
sda.datasetsandsda.file_datasettables. - the dataset has the status
registeredin thesda.dataset_event_log - the dataset gets the status
releasedin thesda.dataset_event_log, this might take a while depending on what date was chosen in the submitter portal.
Upon the confirmation of all submission steps, it is reasonable to infer that the deployment's pipeline component is functioning correctly.