Decision tree syntax reference¶
The Decision tree is an algorithm of filtration that allows building complex procedure of variants selection. The main idea of how decision tree works is described in Decision Tree. In the text below we define and discuss a dialect of Python that is used to construct a decision tree. One can use it to construct decision trees in a more IT way.
Decision tree logic¶
Initially we have the whole set of items (variants) as a working selection.
- At each branching point:
If-instruction selects some subset of working selection;
Return-instruction determines whether the selected subset should be included in the “final selection” (return
True
) or excluded (returnFalse
): :: code-block:: python- if condition :
return bool decision
after If-instruction the selection set is (probably) reduced, and the next instruction is applied to this reduced set; the next instruction is one more If-instruction, or…
final instruction in code is always Return-instruction that determines what should be done to the rest of the working selection: to include it in the “final selection” (
True
) or to exclude it (False
): :: code-block:: pythonreturn bool decision
- There is only one other type of available instruction, Label-instruction:
:: code-block:: python
label (string)
This instruction can be inserted to the decision tree code before any If-instruction. So the user has the possibility to mark the state of working selection by label mark. This mark can be used in complex procedures
(see Filtering functions, functions Compound_Heterozygous() and Compound_Request()).
Syntax principles¶
There are three levels of details in the description of the Decision Tree Python dialect:
necessary level: the dialect deals with a very restricted subset of Python, so only small a amount of Python constructions are allowed in it; below is the complete description of this subset
good practice level: some constructs discussed below are recommended as “good practice”; similar constructs that are not considered good practices could be refactored to their “good practice” analogues in the process of interactive changes of a decision tree
simplification level: since the dialect of Python is very “thin”, for purposes of easy typing and reading it supports the following “simplifications”:
string constants can be typed without quote symbols
""
or''
if they are correct Python identifiers or constantsTrue, False, None
lists vs. sets: in the case when code refers list objects with
[]
parentheses, it is good practice to use set notation with{}
; indeed in most cases, the order of elements in a “list” is irrelevant, while{}
are more readable
Top-level constructions¶
- There are three top-level constructions available in the dialect:
:: code-block:: python
- if condition :
return bool decision
return bool decision
label (string)
The following rules must be followed:
All instructions (excluding Return-sub-instruction of If-instruction) must start at the first character of a line, no indentation
A top-level Return-instruction must be the last nonempty line of code
Label-instruction can be used before any If-instruction
Empty lines between top-level constructions are allowed
Comments are acceptable only as a full line, not as a part of a line with code; comments should start with
#
character, possibly after spaces (note also that comments are not acceptable after the last instruction)It is a good practice to place comment lines only before top-level instructions
condition in If-instruction might be quite long, so one might need multiple lines; It is good practice to use parentheses to group these lines, instead of
\
characters.
Condition constructions¶
Combined conditions¶
Operators and
, or
and not
and parentheses ()
are fully supported
for building complex conditions from atomic ones.
Atomic condition uses identifier of corresponding filtering property`once per atomic condition.
Atomic numeric condition¶
Has form of usual Python comparison operation with operators <
, <=
, ==
, >=
, >
.
Double form is acceptable, for example:
min value < property_id <= max_value
Best practice: use only operators <
, <=
, ==
; in case of operator ==
place property identifier on the left.
Atomic enumerated condition¶
Has different form in dependency of join mode of condition:
OR:property_id in{
set/list of value strings}
AND:property_id in all({
set/list of value strings})
NOT:property_id not in{
set/list of value strings}
Notes:
notation above uses
{}
set parentheses; though it is recommended as a good practice, list parentheses[]
are also supportedoperator in is supported for all enumerated properties, including status (single-value) and multiset (multi-value) properties. For status property its semantic is simple and intuitive. In case of multiset property this notation is more sophisticated: the condition is positive when intersection of two sets is nonempty, i.e. at least one value of the property matches at least one value in the given set; it can be “explained” by a way that object representing filtering property redefines operator in from the left
in case of AND join mode interpretation of all() pseudo-function is even more sophisticated: it can be “explained” if result of all() redefines” in operation in a very specific way from the right.
in terms of Decision Tree there is no strong need for NOT join mode, because operator
not
is supported outside atomic conditions
Atomic function conditions¶
Function conditions have similar form to enumerated conditions with a change of property id to
function_name (parameters)
Syntax for parameters is Python standard. Since all values of the parameters must be JSON objects
(however, with a change of JS constants true/false/null
to Python counterparts True/False/None
),
there should be no problems in setting parameters up. (“Simplifications” are also acceptable for parameters).
See Filtering functions for reference of available functions and their parameters.
Property and function identifiers¶
Each identifier used as property or function (property_id or function_name above) corresponds to only one property or function available in evaluation space. So each available identifier can be used in only one type of atomic conditions.
However, identifier of an atomic condition can be absent in evaluation space, and the corespondent atomic condition is considered as correct but inactive: it is interpeted as always negative (and positive in NOT mode of enumerated and function conditions).
Decision Tree system support¶
The following objects are explicated from the code of decision tree:
Points correspond to instruction in code; each If- or Return- instruction corresponds to a point with state of selection set: either working one or pre-final. The user needs to know how many items (variants) are in these sets, and moreover has possibility to study distribution of values for filtering properties of items in these sets.
Atomic conditions are “atomic” fragments of condition in If instructions. There can be many atomic conditions in one If instruction. It is important functionalify of the system to locate them and provide their modifications.
State labels can be defined in code by Label instructions. They are used with complex functions. This functionalify requires high level of qualification and attendacy of the user, however it might be very important in practice.
A decision tree can be modified in either of two ways:
manual typing and modifications of decision tree code
interactive actions modifying various details of decision tree
Interactive regime allows to make any meaningful transformation of decision tree, so there is no strong need to use manual regime at all. Manual regime requires is helpful for complex manipulations with boolean logic of conditions and, of course for copy/paste operations.