Classimed logo

Towards a Human Cytome Project

G.Valet 1) , A.Tárnok 2)
1) Cell Biochemistry, Max-Planck-Institut für Biochemie, Martinsried, Germany
2) Pediatric Cardiology, Heart Center Leipzig GmbH, University Hospital Leipzig, Germany


The sequencing of the human genome has provided a very significant increase of knowledge on the biomolecular capacity of organisms. Nevertheless only a very limited part of the observed structural and functional multilevel biocomplexity of cells and cell systems (cytomes) can be explained as yet by this knowledge.

The prediction of three dimensional (3D) protein structures from their amino acid sequence is an example for the complexity problems encountered already at the biomolecular level, although one is at the molecuar level far away from the structural and functional complexity of entire cells. As an example, exact predictions of protein structure from amino acid sequences remain difficult after more than 30 years of intensive research and despite the explosive development of hard- and software capacities in the meantime.

Concerning the pharmaceutical industry, the transition from the earlier physiology to target oriented drug discovery has produced significantly less new candidate molecules despite substantially increased investment during the last 10 year period as compared to the period 10 years before, indicating that more knowledge on disease relevant molecular mechanisms and pathways is required.


Considering the high complexity of the association of biomolecules as a result of the activity of 20 to 30.000 coding genes it will be difficult to understand the biocomplexity of cells or cellular systems (cytomes) by traditional hypothesis-driven experimental research. The high redundancy of molecular pathways like in cell signalling, cell proliferation or during apoptosis requires a high number of hypotheses induced investigations for the collection of a multitude of details without being certain to have focused on ultimately relevant disease associated metabolic pathways, molecular hotspots or pharmaceutical targets.

Alternatively, the high number of measurable biomolecules can be used for concept-driven research by data-mining for example by data pattern analysis. The concept consists in using multiparameter single cell analysis by cytomics for individualized disease course prediction in patients under therapy. Diseases are caused by molecular changes in cells. Cells constitute the elementary function units of cell systems and organisms and differential changes of molecular single cell phenotypes can be determined by data pattern classification between for example diseased versus healthy persons. Molecular cell phenotypes result from genotype and exposure.

Hypothesis-driven parameter selection from the genome as inventory list for the biomolecular capacity of organisms in conjunction with hypothesis-free data pattern classification of differential cell phenotypes opens knowledge spaces, that have been earlier inaccessible to hypothesis development, due to a lack of preexisting knowlegde. The predictive medicine by cytomics concept for personalized medicine was developed as one of the consequences of data pattern classification. Its application potential is significantly wider than personalized medicine by pharmacogenomics or genomics restricted to genetic variants.

The use of nature induced systematic perturbations of cell phenotypes in disease for the acquisition of predictive differential data patterns as molecular hotspots (top-down) and disease equivalents simplifies disease related research work significantly. The information of the differentially determined molecular hotspots can be deciphered in a second phase by biomedical cell systems biology to assess and model disease specific molecular pathways for the discovery of new drug targets and lead structures. The disease oriented approach bypasses the time consuming phase of systematic perturbations of model cell systems to gain insight into the reactivity of such systems as prerequisite for the study of diseases processes as suggested by the systems biology (bottom-up) concept.

As one of the consequences of these considerations, external link thoughts about the challenges of a human cytome project were articulated. They raised interest and resulted in an initiative for the formal installation of such a project (L1-L5).

Information collection

Cell systems are composed of various types of single cells as elementary building units of organs and organisms. Individualised cell analysis overcomes the loss of resolution by averaged results from cell or tissue homogenates where measured molecular changes may be wrongly interpreted when due to uniform changes in all cells as opposed to changes in specific cell subpopulation while changes in low frequency cell populations may be lost by dilution. This problem is overcome by the single cell analysis of the molecular cell phenotype as it results from genotype and exposure.

Essential progress in single cell molecular analysis is being achieved by the continuous development of new image and flow cytometric instrumentation like confocal or laser scanning microscopy with multicolor or spectral imaging, multiphoton fluorescence excitation, fluorescence resonance energy transfer (FRET), fast fluorescence imaging in flow, optical cell stretching in flow, miniaturised flow cytometry within laboratories on a chip or laser microdissection of tissues. Biomolecular analysis techniques like bead arrays, single cell polymerase chain reaction (PCR), tyramide signal amplification or biomolecule labeling by quantum dots, magnetic nanobeads and aptamers open new horizons of sensitivity and molecular specificity at the single cell level.

The dimensionality of cell data can be substantial, in multilevel molecular cell profiling for example with repeated multicolor fluorescence staining protocols on numerous different cell populations. Viable cells may be initially stained for cell functions like intracellular pH, transmembrane potentials or Ca2+ levels, followed by fixation to remove the functional stains and restaining for specific extra- or intracellular constitutents like antigens, lipids or carbohydrate structures. After destaining, specific nucleic acids may be stained. Microscopic image capture and analysis systems using their relocation capacities will increasingly permit such staining sequences. Traditional visual and quantitative evaluations of gated two or three dimensional cytometric histograms collect only a very limited amount of the available information and one is never certain whether the really relevant information has been collected. Experience has also shown that it is not easy to provide quality controlled consensus strategies for multiparameter data evaluation.

There is also little pre-existing interpretation knowledge on very complex cytometric multiparameter data spaces. Essential informations may therefore be lost simply due to the lack of awareness.

Data analysis strategies

Considering the efforts for sample collection, staining, measurement and data analysis, it seems mandatory to routinely use automated, self gating evaluation strategies to extract the entire information content of all measured cells for the subsequent knowledge extraction. This means in practice for example in flow cytometry that the percent frequency, means or medians of light scatter and fluorescence signals, light scatter and fluorescence signal ratios as well as the coefficients of variation for all parameters in all evaluation gates should be calculated and databased. It should be the effort to collect this information for more than 95% of all measured cells to be reasonably certain that no relevant information escapes the analysis.

It is for example empirically advisable in this effort to use self adapting and contiguous gates for the automated evaluation of flow cytometrically well known cell population entities like lympho-, mono- or granulocytes as defined by forward (FSC) or sideward (SSC) light scatter or by typical antigenic properties like the expression of CD45 antigen. The subsequent fluorescence gating can be equally automated by standard quadrant evaluation at fixed threshold levels in the gated two parameter histograms. It is of no primary importance at this stage that cell population boundaries are respected since relevant information will be picked up anywhere by the subsequent data sieving analysis for the most discriminatory data patterns, provided the information of more than 95% or all cells has been accessed during the information collection phase.

Bioinformatic knowledge extraction

The knowledge extraction after generalised information collection represents a very essential task. Collected information may easily represent several thousand data columns per set of measurements. The classification of such numbers of data columns by statistics, principal component analysis, fuzzy logic, neural network or self-organizing matrix analysis is frequently beyond the capacity of typical software packages. Classification results by these analysis strategies may furthermore depend to some degree on the assumption of certain mathematical distributions of parameter values or on predefined levels of correlation coefficients in case of cluster analysis. Missing experimental values may have to be reconstituted by assumption or data records have to be discarded which may influence the final classification result.

Data sieving as an alternative non-statistical knowledge extraction strategy does not require mathematical assumptions on value distributions, missing values do not have to be reconstituted, the analysis is suitable for parallel computation and inherently fast because only data thresholding is required for classification.

Relational cell classification system

Multiparametric flow cytometers or microscopes represent complex instrumention and not any two instruments will provide identical results on a given sample despite the use of the same parts. This is caused by the existing tolerances in the multitude of electronical, optical and mechanical components of such instruments. Fluorescence and light scatter signals are measured on relative scales and cell population oriented histogram gating remains to some extent arbitrary. These errors of accuracy mostly cancel when all parameter values are relationally expressed as fraction of the means of results from suitable reference groups. The relational expression conserves the relative individual positions of the parameter means and their coefficients of variation as measure of the dispersion of the parameter value distributions.

Reference groups of the same type, when established in different laboratories will be indistinguishable by classification against each other, provided they are composed of representative reference individuals or samples, and measured with longterm precision and specific reagents. Reference groups can be defined by consensus. In this way, the standardised and laboratory independent classification of relational data is possible and relational databases from different laboratories can be merged into larger standard databases. In case the classification reveals differences between correctly composed reference groups from various laboratories, this is an indication for methodological or ethnic differences.

A relational system for the objective molecular description of diseases and elementary cellular states like differentiation, maturation, division or malignancyat at the cellular level can then be established in this way. Different cell types will be in a standardised relation to each other in some kind of periodic system of cells.

Human cytome project

With these considerations in mind, three major levels of a human cytome project can be distinguished at present.

The first level addresses the behaviour of cells in their life cycle including cell cycle control, biomolecule synthesis, import and export of molecules, energy and oxido-reductive balance, organelle functions to name only some important phenomena. It seems also important to address the very significant dispersion of many cell population parameters ranging between 1:10 until 1:10.000 as evidenced by the coefficients of variations of their value distribution curves. The multiparametric molecular heterogeneity of cells in their combinatorial multiplicity may be of high importance for the reliable adaptation of cells to new conditions and for the susceptibility or resistance to diseases or therapy.

At the second level, single cell preparations either as collected or after mechanic or enzymatic preparation are investigated by flow or image cytometry to determine the molecular status of normal or diseased cells as descriptors for health and disease. This discrimination does not necessarily depend on the representative original in-situ assembly of cells within tissues, provided conclusions can be derived from the molecular status of specific cells or cell combinations.

The third level concerns cells in assembled tissues. The molecular interrelation and proximity of cells within an intact cellular architecture can be studied under the most complex conditions and at the ultimate organisational level of cell systems in health and disease.


Multitudes of projects can be conceptualized for the medical and cell biology areas within the framwork of a human cytome project. The subsequent list indicates some areas of potential interest.


- leukemia/lymphoma: chemotherapy versus stem cell transplantation
- rheumatoid diseases: early detection of high and low therapy requiring patients
- infections: prediction of infection and disease course in newborn, intensive care and ederly patients to apply early preventive therapies
- allergies: detection of predisease sensitization for asthma, ekzema,neurodermitis a.o. in risk families for early application of preventive therapies

Cell Biology

- stem cell differentiation & cell cycle: standardized description of differentiation and cell cycle phases
- cell proteomics: molecular topology of intracellular proteins
- drug target identification: retrograde disease pathway exploration of molecular disease hotspots


- With diseases emerging from deviations of typical molecular processes in cells or cell systems, a detailed molecular knowledge of the enormous biocomplexity of such systems in normal and disease conditions is required to understand the mechanisms of disease processes.

- The necessary knowledge is obtained from multilevel single cell molecular analysis close to in-vivo conditions in combination with exhaustive bioinformatic knowledge extraction (cytomics).

- Results are stored in a standardised relational knowledge system or framework for scientific hypothesis development as well as for directly medicine related purposes like predictive medicine by cytomics that is the prediction on the therapy dependent future disease course in individual patients, furthermore the possibilities of personalized therapy and the search for new pharmaceutical targets.

- The establishment of such a system, using the various single cell oriented molecular technologies in conjunction with specific biomolecule labelling in a specially focused human cytome project, represents a combined challenge to science, medicine and technological innovation.


L5. Valet G, Murphy RF, Robinson JP, Tárnok A, Kriete A. Cytomics - from cell states to predictive medicine. In: Computational Systems Biology, Eds: Kriete A, Eils R, Elsevier, Amsterdam, 2006 p 363-381
L4. Valet G, Tárnok A. Potential and challenges of a human cytome project. JBRHA 18:87-91 (2004)
L3. Valet G, Leary JF, Tárnok A. Cytomics - new technologies: towards a Human Cytome Project. Cytometry 59A:167-171 (2004) *external link (PDF)
L2. Valet G, Tárnok A. Cytomics in Predictive Medicine. Cytometry 53B:1-3 (2003) *external link (PDF)
L1. Valet G. Predictive Medicine by Cytomics: Potential and Challenges. JBRHA 16:164-167 (2002)
- related references: human cytome project, predictive medicine & systems biology
- see also: further references

Evolution of Concept

Meeting lectures 2004/2005:
- external link FOM2004, Philadelphia, USA
- external link ISLH2004, Barcelona, Spain
- external link 9th Workshop 2004, Leipzig, Germany
- external link ISAC XXII, Montpellier, France
- external link BSS'2004, Gdansk, Poland
- external link EWGCCA2004, Mol, Belgium
- external link Workshop Cell within Cytomics 2004, Caceres, Spain
- external link 10th Workshop 2005, Leipzig, Germany
- external link 9th Congr.Soc.Ibérica de Citometria 2005, Porto, Portugal

 (4, 3, 2, 1)
 CD8 2004
 ISBN: 1-890473-C6-5

- see: external link Purdue CD-Series

Off-line Internet, a time saver !

Download the ZIP file containing all Cell Biochemistry pages for example into directory: d:\classimed\, unzip into the same directory, enter the address: file:///d:/classimed/cellbio.html into the URL field of the Internet browser to directly access text & figures on your harddisk free of network delays (further information).

© 2017 G.Valet
last update: Jan 11,2017
first display: Jan 14,2004