Cell Biochemistry Martinsried

Medical Bioinformatics in Cytomics

Flow Cytometry Data Pattern Analysis by the CLASSIF1 Data Sieving Algorithm


1. Potential of Data Pattern Classification

- Flow cytometry data analysis aims often only at the determination of cell frequency within one or several multidimensional gates. An essential part of potentially useful information like fluorescence intensities, average fluorescence surface densities, intercolour fluorescence ratios, coefficients of variation of the fluorescence, light scatter intensity or scatter and fluorescence ratio distributions of cell populations remain usually unconsidererd.
- The goal of data pattern classification in cytomics (cell systems research) aims at the exhaustive knowledge extraction from flow cytometric or other multiparameter data by the determination of the most discriminatory data patterns for individualized disease course predictions or diagnostics.
- CLASSIF1 algorithmic data sieving permits the development of standardized, instrument and laboratory independent data pattern classifiers from flow cytometric list mode, flow bead array, cDNA or protein expression chip array (Lymphochip, Affymetrix), clinical chemistry, biomedical or clinical data for predictive medicine by cytomics or for diagnostic purposes (literature references: bioinformatics, medical and clinical cytomics)

2. CLASSIF1 Algorithm

The algorithm operates as follows:
- The values of each data column are transformed into triple matrix characters (
fig.7) by assigning: (- =diminished) to values below the lower percentile, (0 =unchanged) to values between the lower and upper percentiles and (+ =increased) to values above the upper percentile of the value distributions of the reference patient samples.
- Percentiles 10/90%, 15/95%, 20/80% etc. are calculated for the value distribution of the reference patient samples for each database column of the learning set (
- A classification (confusion) matrix is established between the known predictive or diagnostic clinical classification of reference and abnormal patient samples against the same classification categories for the CLASSIF1 triple matrix classifier (
- Correct classification for all patient samples is indicated by 100% recognition values in the diagonal boxes of the classification matrix and 0% values in the non diagonal boxes. The initial classification of learning sets is typically significantly away from this ideal result (
- The iterative CLASSIF1 optimization is directed towards an increase of the sum of the diagonal box values above the initial sum of values (fig.10B) by the temporary exclusion of single or variable combinations of database columns from the classification process to see whether their absence improves or deteriorates the classification result. At the end the most frequent triple matrix character of the database columns improving the classification represent the disease classification masks (fig.3) while all other data columns are definitively excluded.
- The patients are classified according to the highest positional coincidence of their individual patient classification mask with any one of the disease classification masks. The similar reclassification of the learning set (fig.2) and of the prospective classification of unknown test patients (fig.6) shows the validity of the concept.
- Triple matrix classifiers are inherently standardized onto the reference samples during the classification process (standardized multiparameter data classification (SMDC)). Classifiers can therefore be compared in an instrument and laboratory independent way, in case no differences between the various reference groups are detected by the CLASSIF1 algorithm. This is advantageous for consensus formation e.g. on leukemia, HIV and thrombocyte classifications by immunophenotyping.
- The identity of reference groups from different institutions for classification purposes is assured by proving that the various reference groups cannot be discriminated by data pattern analysis from each other.
- The systematic analysis of patient classification masks provides information on individual genotypic and exposure influences on expressed data patterns. Such analysis may prove useful for the development of a relational classification system similar to the periodic system of elements. In such a system different cell types and their activity states could be compared in a standardised way for example during disease development, under therapy but also during cell division, cell differentiation or cell migration.
- The performance of triple matrix classifiers depends on intralaboratory precision rather than on accuracy since measurement accuracy cancels out within certain limits through the normalization of the experimental values on the respective mean values of the reference samples in each database column. Reference groups are typically constituted from age and sex matched patients.

3. Examples

- FCS1.0 and 2.0 list mode files or dBase3 database exports from database (e.g. Access) or table calculation programs (e.g. Excel) are classified by the CLASSIF1 algorithm in various clinical situations:

4. Conclusions

- the CLASSIF1 algorithm provides access to
Predictive Medicine with >95% correct disease course prediction in individual patients as well as to standardized diagnostic classifications.
- the CLASSIF1 approach facilitates the elaboration of interlaboratory consensus classifiers in complex multiparameter data sieving or data mining analysis.
- as a practical consequence, diseases can be classified at institutions where no sufficient learning sets can be generated in reasonable times or where costly investigations are necessary to establish appropriate learning sets.
- furthermore the molecular and biochemical properties of many body cell systems during disease can be compared by standardized classification e.g. blood leukocytes versus tissue or effusion leukocytes.

Off-line Internet, a timesaver !

Download the self unpacking file containing the Cell Biochemistry pages, follow the instructions (PC) to access text and graphics on your harddisk by the Internet browser free of network delays.
MAC users download and unpack the file on a PC and transfer the unpacked files to the MAC computer for inspection by the Internet browser.

2004 G.Valet
Max-Planck-Institut für Biochemie, Am Klopferspitz 18a, D-82152 Martinsried, Germany, Tel: +49/89/8578-2518, -2525, Fax: +49/89/8578-2563, INTERNET: http://www.biochem.mpg.de/valet/cellbio.html
Last Update: Apr.10,2004