Discriminative Power specification¶
This document describes an algorithm of Discriminative Power calculation.
Formulation of the problem¶
As was said in the filters description, the Discriminative power is a special numeric characteristic used to indicate the “Effectivity” of a filter. This means that “Effective” filter can split the current set of variants on the most strongly differing groups, e.g. with the maximum loss of “entropy”.
As a possibility, we can use here evaluation of plain proper entropy, but it is not so helpful if the set of variants is wide (up to millions). So, we evaluate some entropy-like product instead of proper entropy, and below is the description of the procedure.
Procedure of evaluation¶
For each property, we start from property status data, and form list a of numeric (positive integer) values, one integer for each of the “choices”:
For numeric property we have a histogram of values, use this as a “list of choices”
- For enumerated property we have a count of each value - use this as a base list for the “list of choices”, and take in account the “total count”, which equals to the filtered-counts of variants (or transcripts, in some cases)
For status property the sum of values in the base list equals to the total count, the base list is the final
If the sum of values in the base list is less than the total count, append the difference to the base list, otherwise use the base one as it
Note: The procedure is rough, so we ignore (up to now) the specifics of multivalued enumerated properties with wide overlap. With the use of the “list of choices” we evaluate Discriminative Power by the following procedure (in JavaScript language):
function entropyReport(counts) {
var total = 0.;
for (j = 0; j < counts.length; j++) {
total += counts[j]
}
if (total < 3)
return -1;
var sum_e = 0.;
var cnt = 0;
for (j = 0; j < counts.length; j++) {
if (counts[j] == 0)
continue;
cnt++;
quote = counts[j] /total
sum_e -= quote * Math.log2(quote);
}
var norm_e = sum_e / Math.log2 (total);
var divisor = reduceTotal(counts.length)
var pp = norm_e / divisor;
return Math.min(1.0, pp);
}
/*************************************/
function reduceTotal(total) {
if (total < 10)
// for N < 10; easy to select
return 0.3 + 0.02 * (total);
if (total < 25)
// for N < 25; still possible to view all values together on one page
return 0.5 + (total - 10) / 15;
if (total < 125)
// for N < 125; easy to scroll
return 2 + Math.sqrt(total - 25) / 5;
if (total < 625)
// for N < 625; still possible to scroll
return 4 + (total - 125) / 100;
if (total < 3125)
// for N < 3125; difficult to scroll, exact N no longer matters
return 9 + (total - 625) / 200;
// for N >= 3125; visual selection
return 21.5;
}