Administration aspects overview¶
Table of conventional names¶
The following notations are used in different places of administration documentation
Name
Role
Example
User/Group
ANFISA_ADMIN
User that runs service
ANFISA_ADMIN_GROUP
Group of the user
Directories
ANFISA_ROOT
Root directory
/projects/anfisa
ANFISA_HOME
Top source directory
/projects/anfisa/anfisa
ANFISA_WORK
Directory with service subdirectories
/projects/anfisa/a-setup
URLs
ANFISA_HTML_BASE
top service URL
http://<server>/anfisa/
ANFISA_HTML_APP_BASE
URL to LocalUI and REST
http://<server>/anfisa/app/
Preparation of the service¶
Before start service it is necessary to load gene database to Mongo.
Running the service¶
The service itself and the utilities (see below) should be run under the same user ANFISA_ADMIN
.
In standalone mode the service is started with either of the following command sequences:
$ cd $ANFISA_HOME
$ python -m app.run [path to configuration file]
or
$ env PYTHONPATH=”$ANFISA_HOME”
$ python -m app.run [path to configuration file]
(Make sure if Python virtual environment is properly activated.
In server mode, while setting the uWSGI container, also run the process by the user ANFISA_ADMIN
and use the correct configuration file.
The following commands start resp. stop the uWSGI service:
$ sudo systemctl start anfisa
$ sudo systemctl stop anfisa
Administrator competence¶
Administrator of Anfisa instance is responsible for:
with use of app.storage - Dataset creation and upload utility
create and update primary datasets, keep dataset naming policy
drop primary and derived datasets, when they become out of use
with use of app.adm_mongo - MongoDB storage administration utility
backup solution and tagging information accumulated in Mongo storage
mongo_gene_db
Using server configuration (anfisa.json)¶
In both modes the main configuration file is important. There is the file from the repository $ANFISA_HOME/anfisa.json.template
. This file is the default configuration of the service. To use it in stand-alone mode just copy it to anfisa.json
to the same directory.
It is assumed that it is enough to run the stand-alone mode of the service in the default configuration, just copy template to anfisa.json
to the same directory.
In the server mode, as well as in the stand-alone one but with some modifications, one needs to make a copy of this file and put it in another directory. $ANFISA_ROOT
is recommended. The name of this file can be the same, anfisa.json, or can be changed. It is important to provide access to the same configuration for administrator utilities and for the stand-alone mode to use it in the start service command.
REST API¶
REST API is the kernel of the system. It is a variety of HTTP requests built within the concept of REST API (ask Google about it). In short, these requests satisfy certain architectural conditions and their responses have the form of JSON objects. Both Legacy UI and NextGen Frontend provide HTML pages, and use these requests in a way hidden from the users to transfer data and actions upon the service. If the NextGen Frontend was perfect, there would be no need to use the internal UI at all. However, the configuration aspects for REST API and the internal UI are close to each other.
File content transfer aspect¶
The aspect of transferring file content is a technical one. However, it produces some complexity in the configuration and needs a detailed explanation. Any WEB-service needs to transfer file content in response to HTTP requests. When we have a “main server” (NGINX/Apache/... ), it is good practice to configure it to perform such requests. So in the server mode two kinds of files (used by internal UI) are transferred on the “main server” level: control files *.js
and *.css
, and images. (See also the next note, on anti-cache mechanism).
In the stand-alone mode there is no such thing as “main server”, so file transfer requests should be supported by the service itself. The option **dir-files** in Configuration service: anfisa.json regulates these operations.
Therefore we have two different mechanisms in different modules to do the same thing, and may it look too complex in the context of this document.
There is a third kind of files that should be transferred. Any Export operation produces an Excel file *.xlsx
that should be downloaded by a client (see below in the section about Export). These transfers happen rarely, and the internal service mechanism (via dir-files) is sufficient for them, so there is no need to configure the “main server” for their support.
Anti-cache mirroring mechanism¶
It is used for purposes of the internal UI in the server mode. The problem it solves is the following. The internal UI uses some files (with extensions *.js
and *.css
), and these files are checked out from the repository. So after a push from the repository these files can change. If these files were used by the UI directly, there would be a possibility that the user’s browser will ignore changes in such a file and use some outdated cached copy of its previous version instead of the fresh version of it. The workaround for this problem is to create a mirror directory, copy into it all the necessary files but slightly modify their names in such a way that different versions of the same file will have different names.
This mechanism is recommended for the server mode. However, it can be set up in the stand-only mode as well.
Directory structure: vault, datasets, logs, export directory¶
Strictly speaking, there is no real necessity in the existence of two “standard’ directories: $ANFISA_ROOT
and $ANFISA_WORK
. But the system strongly requires all the subdirectories placed under $ANFISA_WORK
. One can place them anywhere on the computer and modify configuration (anfisa.json) correspondingly.
$ANFISA_WORK/vault
- vault directoryInformation of all the datasets supported by the system is placed here: one dataset - one directory. (There might be a serious need to place this directory in another location: it can happen if the size of a dataset grows and one needs to move the directory to another disk. Just move it, and change the “data-vault” line in the config file correspondingly)
Note: there is no strong need to have this directory excactly inside
$ANFISA_WORK
. It might be large, so there might be a reason to move it on another disk, so just redefine option data-vault in Configuration service: anfisa.json.
Subdirectories of the vault directory
Directories for subsets should be created by calls of the app.storage utility made by user $ANFISA_ADMIN. Removal of an XL-dataset should be also done using this utility, because connection to Druid is required in this process. The removal of a workspace also can be done this way, but it is just equivalent to removal of the corresponding directory.
Please note that this directory contains the empty file active. To turn a dataset out of use in terms of the service one needs to remove this file and restart the service. To re-activate the dataset just create the file active once again (by the utility touch)
$ANFISA_WORK/export
- export directoryThis is the place where the excel-template file is located. The main need is in the subdirectory
$ANFISA_WORK/export/work
This is the place where the system stores all the Excel files generated for export. Each file here is used only once, but there is no automatic procedure to clean files from here later. This clearance should be done by the system administrator periodically
$ANFISA_WORK/logs
- log directoryIn the stand-alone mode only the anfisa.log file is stored here, plus its old portions. In the server mode there are two log files: anfisa.log and uwsgi.log. The former collects all errors that occurred “inside service logic”, the latter collects meaningful errors at the start of the service.
There is no automatic procedure to cleanse this directory either. Administrator should do it periodically.
$ANFISA_WORK/ui
- mirroring directoryUsed in the anti-cache mechanism in the server mode. See details above
Dataset internal structure¶
Minimal dataset data is just annotated JSON file: <dataset>.json.gz
In expanded form, dataset data forms directory with the following:
Inventory of dataset:
<dataset>.cfg
Annotated JSON: <dataset>.json.gz (see Administration file formats reference for details)
Optional subdirectory doc/ with supplementary documentation materials
Optional: BAM-files for samples in dataset case
May be more files
File dataset structure¶
Dataset in system is represented in form of sub-directory inside $ANFISA_WORK/vault
with name equals to dataset name and the following files:
dsinfo.json
- principal information on dataset, in JSON format, contains metadata information and schemas prepared for both viewing regime and filtration purposes
active
- empty file, just remove it to pull the dataset off the system, and restore it (shell commandtouch active
) to push the dataset up again
doc/
- directory with documentation on dataset
fdata.json.gz
- complete data used in filtration, good for use in process of Druid ingestion push
pdata.json.gz
- "short" information about variants, contains only information for representation of variant in short form
vdata.ixbz2
- "full" information about variants, contains full annotated JSON records (with some modifications that can be done on creation stage); the format of file "ixbz2" is special in-house one: it allows direct block access for data with blocks compressed by bzip2 algorithm;
stat.json
- information on value distributions for both viewing and filtration fields, collected on creation stage