|
|
Cell Biochemistry Martinsried |
- Percentile pairs e.g. 5/95%, 10/90%, 15/95% etc. are calculated for
the value distribution of the reference samples of each database
column of the learning set
(fig.7)
- The values of each data column are transformed
into triple matrix characters
(fig.8)
by assigning: (-) to values below the lower percentile,
(0) to values between the lower and upper percentiles and
(+) to values above the upper percentile
of the value distribution of the reference sample.
- A confusion matrix is then established between the known
predictive or diagnostic clinical classification of reference
and abnormal samples on the ordinate against the same
classification categories for the CLASSIF1 triple matrix
classifier on the abscissa
(fig.9).
Correct classification for all samples is indicated
by 100% recognition values in the diagonal boxes of the confusion
matrix and 0% values in the non diagonal boxes. The initial
classification of learning sets is usually significantly
away from this ideal result
(fig.10A).
- The iterative CLASSIF1 optimization is directed towards an
increase of the sum of the diagonal box values above the initial
sum of values when all database columns are classified
(fig.10B).
Towards this goal, single database columns or variable combinations of
database columns are temporarily excluded from the
classification process to see whether their absence improves
or deteriorates the classification result.
Once no significant
classification improvement is reached any more,
database columns which neither alone nor in combination with
other database columns have improved the classification result
are definitively excluded, that is only the discriminatory
database columns remain as disease classification masks
(fig.3)
for the normal as well as for the abnormal classification categories.
- Reclassification of the learning set
(fig.2)
or the correct prospective classification of unknown test patients
(fig.6)
shows the validity of the concept.
- Triple matrix classifiers are inherently standardized onto the
reference samples during the classification process
(standardized multiparameter data classification (SMDC)).
Classifiers can therefore be compared in an instrument and
laboratory independent way, in case no differences between the
various reference groups are detected by the CLASSIF1 algorithm.
This is advantageous for consensus formation e.g. on leukemia,
HIV and thrombocyte classifications by immunophenotyping.
- The performance of triple matrix classifiers depends
on intralaboratory precision rather than on
accuracy since measurement accuracy cancels out
within certain limits through the normalization of the experimental
values on the respective mean values of the reference samples in each
database column. Reference groups are typically constituted
from age and sex matched patients.
- as a practical consequence, diseases
can be classified at institutions where no sufficient learning sets can
be generated in reasonable times or where costly investigations are
necessary to establish appropriate learning sets.3. Examples
- FCS1.0 and 2.0 list mode files or dBase3 database
exports from database (e.g. Access) or table calculation
programs (e.g. Excel) are classified by the CLASSIF1 algorithm
in various clinical situations:
4. Database Download
Comparative multiparameter data classification efforts are significantly
hampered by the lack of suitable databases. The downloadable
CLASSIF1 database (PHLUNG1.XLS) of
flow cytometric intracellular pH and esterase activity measurements
in vital cell preparations from malignant and adjacent healthy cells
of lung cancer patients may serve as an example for comparative
classification efforts. Three parameter ADB/PI/electronic cell volume
list mode files were exhaustively analysed as described by:
Liewald et al. Cytometry 11:341-348(1990).
5. Conclusions
- the CLASSIF1 algorithm provides access to
Predictive Medicine ( >95% correct disease
course prediction in individual patients) as well as to
standardized diagnostic classifications.
- the CLASSIF1 approach facilitates the elaboration of
interlaboratory consensus classifiers in complex
multiparameter data mining analysis.
- furthermore the molecular and biochemical properties of
many body cell systems during disease can be compared
by standardized classification e.g. blood leukocytes versus
tissue or effusion leukocytes.
© 2024 G.Valet |