Compositional data (CoDa) are typically defined as vectors of positive components and constant sum, usually 100% or 1. These conditions render most classical statistical techniques incoherent on compositions, as they were devised for unbounded real vectors. However, there are many more types of data having the same limitations: as soon as the variables of a data set show the relative importance of some parts of a whole, data should be considered compositional. Examples of disguised compositions are data presented in ppm, ppb, molarities, or any other concentration units. Aitchison introduced the log-ratio approach to analyse CoDa back in the eighties. His solution was based on transforming the data vector with some log-ratio transformations, and applying classical techniques to the scores so obtained. This became the foundation of modern CoDa analysis, nowadays based on an own geometric structure for the simplex, an appropriate representation of the sample space of CoDa. The validity of these considerations is not restricted to CoDa: there are many more data sets which sample space is not obbeying the rules of real numbers, or that can be given an own, alternative, meaningful geometric structure. Examples abound in the natural and social sciences: vectors of positive amounts, functional data, spherical data, ordered variables, etc. CoDa analysis insights may be of good use to scientists working with these data sets, and vice versa.

Example of a CoDa-dendrogram using CoDaPack

Practitioners interested in CoDA can find in the website: a forum for the exchange of information, material and ideas. Also a free software can be found there, the Compositional Data Package (CoDaPack), which implements presently the most elementary the compositional statistical methods. It can be downlodable from: This software is oriented to users coming from the applied sciences, with no extensive background in using various computer packages. Also R Packages for CoDA can be found: zCompositions, compositions, and robCompositions