Anfisa introduction¶
Anfisa is a Linux-based Python application (Python 3.5+) and provides access via HTTP/HTTPS protocols. It deals with datasets of mutation variants in the human genome.
The main purpose of the system is to study and select mutation variants for a given case of genomics data - dataset. The system provides variety of information associated with variants. Part of this information is a result of evaluations of algorithms applying to the dataset itself. Another part consists of data selected from resources that accumulate known information concerning the whole humanity (and widely the set of all alive biological forms).
Datasets¶
The system works with datasets that are available in vault of the system. A dataset usually represents genomics information for a medical case and concerns a proband patient with relatives. But system supports also datasets with data for cohorts of persons, to perform scientific studies.
:term:Primary datasets are loaded externally by administrators of the system, secondary workspaces can be created by the user from existing datasets.
Kinds of datasets, DNA and transcript variants¶
Main informational unit of the system is DNA variant that presents in genome in a determined location (chromosome/position) and changes referenced sequence of genomics letters ("ref") to an alternative subsequence ("alt"). In regime of XL-datasets<xl-dataset> (eXtra Large) Anfisa system provides work with millions and more variants.
For small datasets, workspaces, WS-datasets Anfisa provides more intensive and ways for work. In particular, in this regime it uses transcript variants as atomic informational units, which are application of known transcription scenario to a variant. Also, Anfisa supports tagging manual functionality and additional zone filtration tool to provide access of the user to the short and exact required information portion.
Since it is a huge amount of information, the system supports various mechanism for the user to effectively restrict his attention onto the most required information. The most powerful mechanism here is Filtration procedures. They are filtering regime and decision trees. These tools can be used to search for the most important variants inside datasets, as well as produce secondary datasets for more accurate work with reduced amount of variants.
Variant properties¶
Properties of variants are used in the system for two main purposes:
viewing properties represent information on variants in form understandable by the user, they are the main atomic items for viewing regimes
filtering properties of variants form the low data level for filtration processes, as objects for definition of conditions
Work pages of the system¶
There are 4 kinds of Front End pages support by the system:
There is also directory pages for the whole vault and its portions with fixed root dataset, they are provided on Back End level by request dirinfo
Other features¶
The system supports gene symbol database. The data is collected from two souces: HGNC and Ensembl/GTF.
Architecture: Back End, REST API, Front End¶
Back End is the kernel of the system. It is written on Python language and it supports the kernel functionality of the system.
Front End is an application that provides the user a comfort access to the system from an Internet browser.
To access the Back End the Front End uses the set of HTTP requests that is REST API of Anfisa. "REST" term means that the API satisfies certain architectural conditions and their responses are in JSON format.
This documentation set describes Anfisa REST API in details.
External systems¶
Anfisa uses the following external systems:
MongoDB this database is used to store information about user activities; it does NOT contain information about datasets.
Druid OLAP system this engine is used for effective support of XL-datasets (Druid is not necessary while working without XL-datasets)