Clinically relevant data patterns for individualized DLBCL outcome predictions
The authors ( 1), like others before ( 2), calculate gene signatures from RNA expression chips by correlation statistics. Correlation maximization and Kaplan-Meier patient stratification in their study identify four large B cell lymphoma (DLBCL) patient groups of different survival probability. The 105 signatures, derived from the available 54630 chip spot intensities concern inhomogeneous patient groups. It is difficult to find bad outcome specific molecular pathways from inhomogeneous gene signatures, being clinically only of limited value for the prediction of individual patient future.
Discriminating data pattern classification (3), in contrast, addresses individual patient future. The classification masks may serve as templates for individualized DLBCL outcome prediction minichips and as starting point for the calculation of correlation statistics. The pretherapeutic outcome predictions may allow the early application of alternative therapy for predicted non survivors. The classification may also include multimolecular flow cytometry data on single cells (4 5 6 7) as additional source of information.
The analysis of gene expression data (tab.1a, 1b) by data pattern classification provides more individualized information than the classification of the gene signatures k001-k105_|u^s (tab.1c, 1d) from correlation maximinization. The direct classification of gene spot sample strength values of non survivors shows increased RNA expression levels for WFIKKN2, CALCOCO2, TUSC2, SBNO2, AP5Z1, TEX261, AL109703 genes (t=2.459, 1.044, 3.228, 4.557, 2.671. 2.763, 4.333 for f=233) and decreased KMT2A, EEA1, QKI, MYRIP levels (t=3.994, 5.291, 1.823, 4.760 for f=233) at |r| <0.5 between classification mask parameters. Gene signatures k007, k012, k059 were increased for non survivors (t=2.248, 3.312, 1.516 for f=233 with correlation coefficients |r| < 0.50 between parameters). The results are in line with the earlier classification of another dataset (2, 3).
The CLASSIF1 algorithm detects shifts of RNA expression value distributions between survivors and non survivors. Values below the lower percentile threshold of the sample strength value distributions of survivors are labelled (-), between lower and upper thresholds (0) and above (+) (triple matrix conversion using percentile thresholds 25-75% or 30-70%). Systematic temporary removal of one or of two data columns in all possible combinations from the classification matrix, followed by reclassification of the remaining matrix columns recognizes the discriminatory potential of each data column. Non discriminatory columns are excluded at the end. The most frequent triple matrix characters of the remaining discriminatory columns constitute the disease classifications masks of survivors and non survivors. Individual patients are classified according to the highest match with either one of the masks. Classification masks contain typically between 5 and 20 parameters (genes).
Discriminating information is enrichd by the algorithm at the expense of correlated information. The iterative information concentration process requires no statistics, no mathematical assumptions or neuronal network layers. The classification process supports outliers, missing values are tolerated until a predetermined threshold, typically between 10 to 40%, before a record is skipped, no assumed values are used and the standardized classifiers can be exchanged between laboratories. The present classifications were performed for gene spots 1-27800 (A) (tab.1c, 1d) and 27801-54630 (B) and 1-54630 (C) to obtain an idea of the homogeneity of information distribution. Average predictive values of 68.3% (A), 61.1% (B) and 59.3% (C) indicate a potential for more detailed evaluation of sliced and recombined information to further improve the classification results. It seems altogether important to devote more attention to individualized predictions from clinical data.
|© 2023 G.Valet|