Cell Biochemistry Martinsried

Towards a Human Cytome Project

G.Valet 1) , A.Tárnok 2)

1) Max-Planck-Institut für Biochemie, Martinsried, Germany
2) Pediatric Cardiology, Heart Center Leipzig GmbH, University Hospital Leipzig, Germany


The sequencing of the human genome has provided a very significant increase of knowledge on the biomolecular capacity of organisms. Nevertheless only a very limited part of the observed structural and functional multilevel biocomplexity of cells and cell systems (cytomes) can be explained as yet by this knowledge.

The prediction of three dimensional (3D) protein structures from their amino acid sequence is a typical example for the complexity problems encountered already at the biomolecular level, still far away from the structural and functional complexity of entire cells. Exact predictions of protein structure from amino acid sequences are still difficult after more than 30 years of intensive research and despite the explosive development of hard- and software capacities in the meantime.


Considering the far more difficult predictions on the association and functionality of biomolecules in viable cells as products of the 20-30.000 coding gene sequences, it seems presently out of reach to shortly understand the enormous biocomplexity of cells or cellular systems (cytomes) exclusively by the traditional deductive top-down hypothesis approach followed by experimental verification. The high redundancy of molecular pathways like in cell signalling, cell proliferation or during apoptosis requires a high number of top-down hypotheses induced investigations for the collection of a multitude of details without the certainty to have focused on ultimately relevant disease associated metabolic pathways, molecular hotspots or pharmaceutical targets. Alternatively, the bottom-up approach of the single cell molecular cell phenotype analysis of entire cell systems (cytomics) as molecular cell systems research represents a kind of multilevel reverse engineering strategy as an inductive complement (fig.1) of deductive approaches. In this was, a standardised relational framework of directly disease related molecular interrelations can be established to be subsequently complemented by the required details.

The predictive medicine by cytomics concept (L1 L2 L3 L4) was developed as a consequence of these considerations. It generated recent thoughts about the challenges of a human cytome project which have raised interest (Focus on Microscopy 2004) ( for details).

Information collection

Cell systems are composed of various types of single cells as elementary building units of organs and organisms. Individualised cell analysis overcomes the problem of averaged results from cell or tissue homogenates where measured molecular changes may be wrongly interpreted when due to uniform changes in all cells as opposed to changes in specific cell subpopulation while changes in low frequency cell populations may be lost by dilution. This problem is overcome by the single cell analysis of the molecular cell phenotype as it results from genotype and exposure.

Essential progress in single cell molecular analysis is achieved by the continuous development of new image and flow cytometric instrumentation like confocal or laser scanning microscopy with multicolor or spectral imaging, multiphoton fluorescence excitation, fluorescence resonance energy transfer (FRET), fast fluorescence imaging in flow, optical cell stretching in flow, miniaturised flow cytometry within laboratories on a chip or laser microdissection of tissues. Biomolecular analysis techniques like bead arrays, single cell polymerase chain reaction (PCR), tyramide signal amplification or biomolecule labeling by quantum dots, magnetic nanobeads and aptamers open new horizons of sensitivity and molecular specificity at the single cell level.

The dimensionality of cell data can be substantial, in multilevel molecular cell profiling for example with repeated multicolor fluorescence staining protocols on numerous different cell populations. Viable cells may be initially stained for cell functions like intracellular pH, transmembrane potentials or Ca2+ levels, followed by fixation to remove the functional stains and restaining for specific extra- or intracellular constitutents like antigens, lipids or carbohydrate structures. After destaining, specific nucleic acids may be stained. Microscopic image capture and analysis systems using their relocation capacities will increasingly permit such staining sequences. Traditional visual and quantitative evaluations of gated two or three dimensional cytometric histograms collect only a very limited amount of the available information and one is never certain whether the really relevant information has been collected. Experience has also shown that it is not easy to provide quality controlled consensus strategies for multiparameter data evaluation.

There is also little pre-existing interpretation knowledge on very complex cytometric multiparameter data spaces. Essential informations may therefore be lost simply due to the lack of awareness.

Data analysis strategies

Considering the efforts for sample collection, staining, measurement and data analysis, it seems mandatory to routinely use automated, self gating evaluation strategies to extract the entire information content of all measured cells for the subsequent knowledge extraction. This means in practice for example in flow cytometry that the percent frequency, means or medians of light scatter and fluorescence signals, light scatter and fluorescence signal ratios as well as the coefficients of variation for all parameters in all evaluation gates should be calculated and databased. It should be the effort to collect this information for more than 95% of all measured cells to be reasonably certain that no relevant information escapes the analysis.

It is for example empirically advisable in this effort to use self adapting and contiguous gates for the automated evaluation of flow cytometrically well known cell population entities like lympho-, mono- or granulocytes as defined by forward (FSC) or sideward (SSC) light scatter or by typical antigenic properties like the expression of CD45 antigen. The subsequent fluorescence gating can be equally automated by standard quadrant evaluation at fixed threshold levels in the gated two parameter histograms. It is of no primary importance at this stage that cell population boundaries are respected since relevant information will be picked up anywhere by the subsequent data sieving analysis for the most discriminatory data patterns, provided the information of more than 95% or all cells has been accessed during the information collection phase.

Bioinformatic knowledge extraction

The knowledge extraction after generalised information collection represents a very essential task. Collected information may easily represent several thousand data columns per set of measurements. The classification of such numbers of data columns by statistics, principal component analysis, fuzzy logic, neural network or self-organizing matrix analysis is frequently beyond the capacity of typical software packages. Classification results by these analysis strategies may furthermore depend to some degree on the assumption of certain mathematical distributions of parameter values or on predefined levels of correlation coefficients in case of cluster analysis. Missing experimental values may have to be reconstituted by assumption or data records have to be discarded which may influence the final classification result.

Data sieving as an alternative non-statistical knowledge extraction strategy does not require mathematical assumptions on value distributions, missing values do not have to be reconstituted, the analysis is suitable for parallel computation and inherently fast because only data thresholding is required for classification.

Relational cell classification system

Multiparametric flow cytometers or microscopes represent complex instrumention and not any two instruments will provide identical results on a given sample despite the use of the same parts. This is caused by the existing tolerances in the multitude of electronical, optical and mechanical components of such instruments. Fluorescence and light scatter signals are measured on relative scales and cell population oriented histogram gating remains to some extent arbitrary. These errors of accuracy mostly cancel when all parameter values are relationally expressed as fraction of the means of results from suitable reference groups. The relational expression conserves the relative individual positions of the parameter means and their coefficients of variation as measure of the dispersion of the parameter value distributions.

Reference groups of the same type, when established in different laboratories will be indistinguishable by classification against each other, provided they are composed of representative reference individuals or samples, and measured with longterm precision and specific reagents. Reference groups can be defined by consensus. In this way, the standardised and laboratory independent classification of relational data is possible and relational databases from different laboratories can be merged into larger standard databases. In case the classification reveals differences between correctly composed reference groups from various laboratories, this is an indication for methodological or ethnic differences.

A relational system for the objective molecular description of diseases and elementary cellular states like differentiation, maturation, division or malignancyat at the cellular level can then be established in this way. Different cell types will be in a standardised relation to each other in some kind of periodic system of cells.

Human cytome project

Mit diesen Gedankenhintergrund With these considerations in mind, three major levels of a human cytome project can be distinguished at present.

The first level addresses the behaviour of cells in their life cycle including cell cycle control, biomolecule synthesis, import and export of molecules, energy and oxido-reductive balance, organelle functions to name only some important phenomena. It seems also important to address the very significant dispersion of many cell population parameters ranging between 1:10 until 1:10.000 as evidenced by the coefficients of variations of their value distribution curves. The multiparametric molecular heterogeneity of cells in their combinatorial multiplicity may be of high importance for the reliable adaptation of cells to new conditions and for the susceptibility or resistance to diseases or therapy.

At the second level, single cell preparations either as collected or after mechanic or enzymatic preparation are investigated by flow or image cytometry to determine the molecular status of normal or diseased cells as descriptors for health and disease. This discrimination does not necessarily depend on the representative original in-situ assembly of cells within tissues, provided conclusions can be derived from the molecular status of specific cells or cell combinations.

The third level concerns cells in assembled tissues. The molecular interrelation and proximity of cells within an intact cellular architecture can be studied under the most complex conditions and at the ultimate organisational level of cell systems in health and disease.


- With diseases emerging from deviations of typical molecular processes in cells or cell systems, a detailed molecular knowledge of the enormous biocomplexity of such systems in normal and disease conditions is required to understand the mechanisms of disease processes.

- The necessary knowledge is obtained from multilevel single cell molecular analysis close to in-vivo conditions in combination with exhaustive bioinformatic knowledge extraction (cytomics).

- Results are stored in a standardised relational knowledge system or framework for scientific hypothesis development as well as for directly medicine related purposes like predictive medicine by cytomics that is the prediction on the therapy dependent future disease course in individual patients, furthermore the possibilities of personalized therapy and the search for new pharmaceutical targets.

- The establishment of such a system, using the various single cell oriented molecular technologies in conjunction with specific biomolecule labelling in a specially focused human cytome project, represents a combined challenge to science, medicine and technological innovation.


L1. Valet G: Predictive Medicine by Cytomics: Potential and Challenges. JBRHA 2002, 16:164-167
L2. Valet G, Tárnok A: Cytomics in Predictive Medicine. Cytometry 2003; 53B:1-3
L3. Valet G, Repp R, Link H, Ehninger G, Gramatzki M and SHG-AML study group: Pretherapeutic identification of high risk acute myeloid leukemia (AML) patients from immunophenotype, cytogenetic and clinical parameters. Cytometry 2003, 53B:4-10
L4. Valet G, Leary JF, Tárnok A: Cytomics - New Technologies: Towards a Human Cytome Project. Cytometry (2004), in press
L5.. Valet G, Hoeffkes HG: Data pattern analysis for the individualised pretherapeutic identification of high-risk large B-cell lymphoma (DLBCL) patients by cytomics. Cytometry (2004), in press

© 2024 G.Valet
1965-2006: Max-Planck-Institut für Biochemie, Am Klopferspitz 18a, D-82152 Martinsried, Germany
Last update: Apr 10,2004
First display: Jan 14,2004