Discriminative Power specification

This document describes an algorithm of Discriminative Power calculation.

Formulation of the problem

As was said in the filters description, the Discriminative power is a special numeric characteristic used to indicate the “Effectivity” of a filter. This means that “Effective” filter can split the current set of variants on the most strongly differing groups, e.g. with the maximum loss of “entropy”.

As a possibility, we can use here evaluation of plain proper entropy, but it is not so helpful if the set of variants is wide (up to millions). So, we evaluate some entropy-like product instead of proper entropy, and below is the description of the procedure.

Procedure of evaluation

For each property, we start from property status data, and form list a of numeric (positive integer) values, one integer for each of the “choices”:

  • For numeric property we have a histogram of values, use this as a “list of choices”

  • For enumerated property we have a count of each value - use this as a base list for the “list of choices”, and take in account the “total count”, which equals to the filtered-counts of variants (or transcripts, in some cases)
    • For status property the sum of values in the base list equals to the total count, the base list is the final

    • If the sum of values in the base list is less than the total count, append the difference to the base list, otherwise use the base one as it

Note: The procedure is rough, so we ignore (up to now) the specifics of multivalued enumerated properties with wide overlap. With the use of the “list of choices” we evaluate Discriminative Power by the following procedure (in JavaScript language):

Discriminative power calculation
function entropyReport(counts) {
    var total = 0.;
    for (j = 0; j < counts.length; j++) {
        total += counts[j]
    }
    if (total < 3)
            return -1;
    var sum_e = 0.;
    var cnt = 0;
    for (j = 0; j < counts.length; j++) {
            if (counts[j] == 0)
                continue;
            cnt++;
            quote = counts[j] /total
            sum_e -= quote * Math.log2(quote);
    }
    var norm_e = sum_e / Math.log2 (total);
    var divisor = reduceTotal(counts.length)
    var pp = norm_e / divisor;
    return Math.min(1.0, pp);
}
/*************************************/
function reduceTotal(total) {
    if (total < 10)
        // for N < 10; easy to select
        return 0.3 + 0.02 * (total);
    if (total < 25)
        // for N < 25; still possible to view all values together on one page
        return 0.5 + (total - 10) / 15;
    if (total < 125)
        // for N < 125; easy to scroll
        return 2 + Math.sqrt(total - 25) / 5;
    if (total < 625)
        // for N < 625; still possible to scroll
        return 4 + (total - 125) / 100;
    if (total < 3125)
        // for N < 3125; difficult to scroll, exact N no longer matters
        return 9 + (total - 625) / 200;
    // for N >= 3125; visual selection
    return 21.5;
}