Classimed logo

Clinically relevant data patterns for individualized DLBCL outcome predictions
Günter Valet

Comment to: external link Grau et al. (2019)
further details in: fig.1 fig.2 fig.3

The authors (external link 1), like others before (external links 2), calculate gene signatures from RNA expression chips by correlation statistics. Correlation maximization and Kaplan-Meier patient stratification in their study identify four large B cell lymphoma (DLBCL) patient groups of different survival probability. The 105 signatures, derived from the available 54630 chip spot intensities concern inhomogeneous patient groups. It is difficult to find bad outcome specific molecular pathways from inhomogeneous gene signatures, being clinically only of limited value for the prediction of individual patient future.

Discriminating data pattern classification (3), in contrast, addresses individual patient future. The classification masks may serve as templates for individualized DLBCL outcome prediction minichips and as starting point for the calculation of correlation statistics. The pretherapeutic outcome predictions may allow the early application of alternative therapy for predicted non survivors. The classification may also include multimolecular flow cytometry data on single cells (4 5 6 external links 7) as additional source of information.

The analysis of gene expression data (tab.1a, 1b) by data pattern classification provides more individualized information than the classification of the gene signatures k001-k105_|u^s (tab.1c, 1d) from correlation maximinization. The direct classification of gene spot sample strength values of non survivors shows increased RNA expression levels for WFIKKN2, CALCOCO2, TUSC2, SBNO2, AP5Z1, TEX261, AL109703 genes (t=2.459, 1.044, 3.228, 4.557, 2.671. 2.763, 4.333 for f=233) and decreased KMT2A, EEA1, QKI, MYRIP levels (t=3.994, 5.291, 1.823, 4.760 for f=233) at |r| <0.5 between classification mask parameters. Gene signatures k007, k012, k059 were increased for non survivors (t=2.248, 3.312, 1.516 for f=233 with correlation coefficients |r| < 0.50 between parameters). The results are in line with the earlier classification of another dataset (2, 3).

The CLASSIF1 algorithm detects shifts of RNA expression value distributions between survivors and non survivors. Values below the lower percentile threshold of the sample strength value distributions of survivors are labelled (-), between lower and upper thresholds (0) and above (+) (triple matrix conversion using percentile thresholds 25-75% or 30-70%). Systematic temporary removal of one or of two data columns in all possible combinations from the classification matrix, followed by reclassification of the remaining matrix columns recognizes the discriminatory potential of each data column. Non discriminatory columns are excluded at the end. The most frequent triple matrix characters of the remaining discriminatory columns constitute the disease classifications masks of survivors and non survivors. Individual patients are classified according to the highest match with either one of the masks. Classification masks contain typically between 5 and 20 parameters (genes).


Discriminating information is enrichd by the algorithm at the expense of correlated information. The iterative information concentration process requires no statistics, no mathematical assumptions or neuronal network layers. The classification process supports outliers, missing values are tolerated until a predetermined threshold, typically between 10 to 40%, before a record is skipped, no assumed values are used and the standardized classifiers can be exchanged between laboratories. The present classifications were performed for gene spots 1-27800 (A) (tab.1c, 1d) and 27801-54630 (B) and 1-54630 (C) to obtain an idea of the homogeneity of information distribution. Average predictive values of 68.3% (A), 61.1% (B) and 59.3% (C) indicate a potential for more detailed evaluation of sliced and recombined information to further improve the classification results. It seems altogether important to devote more attention to individualized predictions from clinical data.


external links 1. Grau, M., Lenz, G., Lenz, P. Dissection of gene expression datasets into clinically relevant interaction signatures via high dimensional correlation maximization. Nat. Comm. 10:5417 (2019). (external links chip data, external links patient data)
external links 2. Rosenwald, A. et al. Lymphoma/Leukemia Molecular Profiling Project. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. NEJM 346,1937–1947 (2002). (external links chip data, external links patient data)
3. Valet, G., Höffkes, H.G. Data pattern analysis for the individualised pretherapeutic identification of high-risk diffuse large B-cell lymphoma (DLBCL) patients by cytomics. Cytometry 59A, 232-236 (2004).
4. Valet, G. et al. White cell and thrombocyte disorders: Standardized, self-learning flow cytometric list mode data classification with the CLASSIF1 program system. Ann NY Acad Sci 677: 233-251 (1993).
5. Valet, G. Human cytome project: A new potential for drug discovery. In: Las Omicas genomica, proteomica, citomica y metabolomica: modernas tecnologias para desarrollo de farmacos. Ed: Real Academia Nacional de Farmacia, Madrid p 207-228 (2005).
6. Valet, G. Data pattern classification by data sieving (CLASSIF1 algorithm) (1995).
external links 7. Vallangeon, B.D., Tyer, C., Williams, B., Lagoo, A.S. Improved Detection of Diffuse Large B-Cell Lymphoma by flow cytometric immunophenotyping - Effect of tissue disaggregation method. Cytometry 90B, 455–461 (2016).

© 2023 G.Valet
Last Update: Jun 04,2022
First display: Feb 24,2021