Cell Biochemistry Martinsried
Medical Bioinformatics in Cytomics
Flow Cytometry Data Pattern Analysis by the CLASSIF1 Data
1. Potential of Data Pattern Classification
- Flow cytometry data analysis aims often only at the determination
of cell frequency within one or several multidimensional gates.
An essential part of potentially useful information like
fluorescence intensities, average fluorescence surface densities,
intercolour fluorescence ratios, coefficients of variation of the
fluorescence, light scatter intensity or scatter and fluorescence
ratio distributions of cell populations remain usually
- The goal of data pattern classification in cytomics (cell systems research)
aims at the exhaustive knowledge extraction from flow cytometric or
other multiparameter data by the determination of the most discriminatory
data patterns for individualized disease course predictions or diagnostics.
- CLASSIF1 algorithmic data sieving permits the development
of standardized, instrument and laboratory independent
data pattern classifiers from flow cytometric list mode, flow bead array,
cDNA or protein expression chip array (Lymphochip, Affymetrix),
clinical chemistry, biomedical or clinical data for
or for diagnostic purposes
medical and clinical cytomics)
2. CLASSIF1 Algorithm
The algorithm operates as follows:
- The values of each data column are transformed
into triple matrix characters
by assigning: (- =diminished) to values below the lower
percentile, (0 =unchanged) to values between the lower
and upper percentiles and (+ =increased) to values
above the upper percentile of the value distributions of the
reference patient samples.
- Percentiles 10/90%, 15/95%, 20/80% etc. are
calculated for the value distribution of the reference patient
samples for each database column of the learning set
- A classification (confusion) matrix is established between the
known predictive or diagnostic clinical classification of reference
and abnormal patient samples against the same
classification categories for the CLASSIF1 triple matrix
- Correct classification for all patient samples is
indicated by 100% recognition values in the diagonal boxes of the
classification matrix and 0% values in the non diagonal boxes. The
initial classification of learning sets is typically
significantly away from this ideal result
- The iterative CLASSIF1 optimization is directed towards an
increase of the sum of the diagonal box values above the initial
sum of values (fig.10B)
by the temporary exclusion of single or variable combinations of
database columns from the classification process to see whether
their absence improves or deteriorates the classification result.
At the end the most frequent triple matrix character of
the database columns improving the classification
represent the disease classification masks
while all other data columns are definitively excluded.
- The patients are classified according to the highest
positional coincidence of their individual patient classification
mask with any one of the disease classification masks. The similar
reclassification of the learning set
and of the prospective classification of unknown test patients
shows the validity of the concept.
- Triple matrix classifiers are inherently standardized onto the
reference samples during the classification process
(standardized multiparameter data classification (SMDC)).
Classifiers can therefore be compared in an instrument and
laboratory independent way, in case no differences between the
various reference groups are detected by the CLASSIF1 algorithm.
This is advantageous for consensus formation e.g. on leukemia,
HIV and thrombocyte classifications by immunophenotyping.
- The identity of reference groups from different
institutions for classification purposes is assured by proving
that the various reference groups cannot be discriminated by
data pattern analysis from each other.
- The systematic analysis of patient
classification masks provides information on individual
genotypic and exposure influences on
expressed data patterns. Such analysis may prove useful for
the development of a relational classification system similar
to the periodic system of elements. In such a system different
cell types and their activity states could be compared in a
standardised way for example during disease development,
under therapy but also during cell division, cell differentiation
or cell migration.
- The performance of triple matrix classifiers depends
on intralaboratory precision rather than on
accuracy since measurement accuracy cancels out
within certain limits through the normalization of the experimental
values on the respective mean values of the reference samples in each
database column. Reference groups are typically constituted
from age and sex matched patients.
- FCS1.0 and 2.0 list mode files or dBase3 database
exports from database (e.g. Access) or table calculation
programs (e.g. Excel) are classified by the CLASSIF1 algorithm
in various clinical situations:
- the CLASSIF1 algorithm provides access to
with >95% correct disease
course prediction in individual patients as well as to
standardized diagnostic classifications.
- the CLASSIF1 approach facilitates the elaboration of
interlaboratory consensus classifiers in complex
multiparameter data sieving or data mining analysis.
- as a practical consequence, diseases
can be classified at institutions where no sufficient learning sets can
be generated in reasonable times or where costly investigations are
necessary to establish appropriate learning sets.
- furthermore the molecular and biochemical properties of
many body cell systems during disease can be compared
by standardized classification e.g. blood leukocytes versus
tissue or effusion leukocytes.
Off-line Internet, a timesaver !
the self unpacking file containing the Cell Biochemistry pages,
follow the instructions (PC) to
access text and graphics on your harddisk by the Internet browser
free of network delays.
MAC users download and unpack the file on a PC and transfer
the unpacked files to the MAC computer for inspection by the
Max-Planck-Institut für Biochemie, Am Klopferspitz 18a,
D-82152 Martinsried, Germany
Last update: Apr 10,2004
First display: Oct 10,1995