# An Introduction to 'Omics 


The variety of different types of data and ways of analyzing that data that one may encounter in the bioinformatics literature are staggering. The term 'Omics - which derives from similarity of the scale of these datasets to genomics - describes a variety of data generated by various high-throughput techniques in biology. Terms for new 'omic methods have proliferated so rapidly that the blog of Johnathan Eisen has a regularly occuring feature called ['Bad Omics Word of the Day'](https://phylogenomics.blogspot.com/2010/02/worst-new-omics-word-bad-omics-word-of.html) in which he highlights a probably less-than-necessary new 'omics word that has appeared in the literature ('museomics' for sequencing from museums; 'recepterome' for the set of all protein receptors in the cell).

That said, a handful of 'omics approaches are very important to know about since they are both useful and appear frequently in the literature. Depending on the question, studies may employ one or more 'Omic methods. 


|  Technique  | Method |  Output / Interpretation |
|-------------|-------------|--------------------------|
|  **Genomics**   | Extract, fragment and sequence cellular DNA from a single organism | What genes are present, gene sequences, nucleotide composition of the genome, non-coding regulatory elements (promoters, binding sites for transcriptional enhancers and repressors, etc.) |
|  **Transcriptomics** | Extract and sequence RNA transcripts from a single organism | Which genes are 'turned on' (that is actively transcribing in the cell), and their overall relative contribution to the cellular pool of mRNAs. Presence of a transcript indicates the presence of the corresponding gene in the DNA, but an absent transcript may be rperesented by a gene that is off (so absence of a transcript does not prove a gene is not present).
| **Shotgun Metagenomics** | Similar to genomics, but instead of using DNA from a single organism, DNA from a whole community of organisms (typically microorganisms) is fragmented and sequenced | Provides information on what genes are in the community as a whole - and therefore can indicate their functional capabilites. Can sometimes recover partial genomes of specific community members.|
| **Microbial amplicon sequencing** | Collect DNA from a whole community of microorganisms, use PCR to amplify a specific gene of interest shared by the microbes of interest | The result is many versions of the same gene from all the organisms in the community. Resulting data can identify which microorganisms are present and (depending on the details) provide some information about their phylogenetic relationships| 
| **Environmental DNA (eDNA) sequencing**| Similar to microbial amplicon sequencing, but this term is used more commonly when the target gene is shared by animals (e.g. animal-specific COI primers) | The goal is to identify animals (or depending on the gene targeted other organisms) in an environment based on the trace amounts of DNA they leave behind (e.g. counting up which species of frog use a pond. The term may sometimes be used for environmental studies of microbes as well |
| **Metabolomics** | Collect & rapidly freeze a chemical sample. Analyze it using mass spectroscopy to try to identify known compounds. | The result gives a picture of what compounds are present, and their relative abundance, in the sample at a given time. Note that a compound that is both produced and degraded rapidly may be very important but not be present in large amounts.|

For example, it is now common for many studies to combine multiple forms of 'Omic data - genomic, transcriptomic, proteomic, metabolomic, and microbiome, measurements. 

The techniques used to generate, say, metabolomic and genomic data are quite different. However, after initial collection of the 'omic data and metadata about it, quality control, and processing, it is common for any of these types of data to be organized into a simple tabular format.

In this chapter, we will talk about some common approaches to analyzing tabular 'omics data, focusing primarily on examples from microbial community studies. We'll develop an intuition for some common ways that microbial communities can change during disease, and develop some simple mathematical formulas for representing those changes. 

We'll talk about useful python tools for dealing with tabular data using the numpy and pandas packages.

# Microbiome Change Scenarios - Core Microbial Ecology Concepts in 5 Cartoons

In order to learn how to work with tabular 'omics data, it is helpful to explore at least one 'omic approach in depth. In this chapter we will focus on marker gene studies of the microbiome (above). Using 5 scenarios, we'll explore some of the most common ways that a community of microorganisms can change during disease. From there, we'll have a good intuition for *what* we want to calculate or test, and why it matters biologically. We will then move on to the technical details of *how* to work with this type of data in python.

<img src="./resources/Scenario_1_specific_pathogen_no_table-01.png" width="400" align="left"  description="A cartoon of multicolored microbes in 6 healthy people or 6 people with a disease. The picture shows 12 boxes (one per person). In each box there are multiple microbes of different types. A table below will list the counts of microbes in each sample if you cannot view them here."> 

### **Microbial Change Scenario 1**. 

This figure shows a cartoon of the types of microorganisms found in samples from 12 people (perhaps determined using microbial amplicon sequencing or shotgun metagenomics). Each patient is a box. Imagine that the six people shown on the left are healthy while the six on the right have some disease that you suspect might be bacterial in origin. (In this case let's imagine these are all samples from separate people).

**Compare the boxes.** Note any observations you make about what's different about the set of microbes in healthy patients vs. diseased patients. From those observations propose a hypothesis for one way microorganisms might be involved in the disease. **What do you think is going on?**



As you consider these differences, it may help to have a table of which microbes are in which samples. Even if you see a trend, it will probably be obvious that having a table of counts will make it easier to communicate any difference you notice.

Let's construct such a table by naming each type of bacterium based on its color and shape, and counting the number in each sample. 

The result is shown below (recall that samples **S1** to **S6** are from healthy people and **S7** to **S12** are from people with a particular disease):

|Microbe/Sample  |S1|S2|S3|S4|S5|S6|S7|S8|S9|S10|S11|S12|
|----------------|--|--|--|--|--|--|--|--|--|---|---|---|
|purple bacillus |3 |4 | 2|2 |0 |4 | 4|4 |0 | 3 | 2 | 2 |
|cyan rod        |4 |1 | 3|1 |1 |3 | 3|3 |4 | 1 | 1 | 4 |
|navy bacillus   |1 |4 | 2|4 |4 |2 | 2|2 |1 | 4 | 4 | 1 |
|green vibrio    |2 |3 | 0|1 |2 |0 | 1|2 |3 | 0 | 2 | 1 |
|red cocci       |0 |0 | 0|0 |0 |0 | 1|4 |2 | 2 | 3 | 4 |


You've probably noticed at this point that the red cocci are absent in all of the healthy samples, but present in each of the samples from patients with the disease. Therefore it's reasonable to *hypothesize* that the red cocci might be causing the disease.

#### A side note: separating observation from interpretation

It is important at this point to separate our *observations* from their *interpretation*. The first statement we made, that the red cocci were absent from the healthy samples but present in all the diseased ones, was an observation. The second, that *perhaps* the red cocci might be causing the disease was a potential interpretation. 

The same observation can, of course be consistent with many interpretations. For example, imagine that the red cocci were kept out of the gut in healthy people, but some disease allowed the red cocci to grow there. The red cocci may not *cause* this disease - or any disease - but may still be *correlated* with it if they are present as a consequence of the disease. 

Often, carefully designed follow-up experiments are needed to test the different potential causes for an observed correlation. 

#### Koch's postulates

Scenarios like the one above, where a single microorganism causes a disease, are classic examples of Pathogenesis form the basis of [Koch's postulates](https://en.wikipedia.org/wiki/Koch%27s_postulates), one way of demonstrating that a specific bacterium causes a disease using an animal model. 

Koch's postulates demonstrate that a particular microbe causes disease by:
1. Isolating the microbe in pure culture (i.e. growing it alone on a Petri dish) from a diseased animal, but showing that it cannot be isoltated from healthy individuals.
2. Reinnoculating an animal with that pure culture and observing that it causes disease
3. Reculturing the same microbe from the diseased animal.

In cases where each of these steps is fulfilled, few would argue that the results are inconclusive. However, there are many cases where Koch's postulates cannot be used (at least, in their original form) to test whether a microbe causes a disease. Thus, while successfully demonstrating that a pathogen fulfills Koch's postulates demonstrates that it causes a disease, failure to fulfill Koch's postulates does not prove that it does not cause that disease. 

Reasons a true pathogen may not fulfill Koch's postulates include:
- The pathogen can't be grown in pure culture
- The pathogen doesn't infect animals other than humans, and so step 2 can't ethically be tested
- The pathogen is carried asymptomatically by many individuals, but only causes disease in a few. 

<img src="./resources/Scenario_2_community_shift_no_table-01.png" width="400" align="left"  description="A cartoon of multicolored microbes in 6 healthy people or 6 people with a disease. The picture shows 12 boxes (one per person). In each box there are multiple microbes of different types. A table below will list the counts of microbes in each sample if you cannot view them here."> 

### **Microbial Change Scenarios - Scenario 2**. 

As in scenario 1, above, the figure to the left shows a cartoon of the types of microorganisms found in samples from 12 people (perhaps determined using microbial amplicon sequencing or shotgun metagenomics). **Compare the boxes.** Note any observations you make about what's different about the set of microbes in healthy patients vs. diseased patients. In case it is useful, a data table of the same microbes is below:

|Microbe/Sample  |S1|S2|S3|S4|S5|S6|S7|S8|S9|S10|S11|S12|
|----------------|--|--|--|--|--|--|--|--|--|---|---|---|
|purple spirochaete |2 |3 | 3|3 |2 |2 | 0|1 |1 | 0 | 1 | 0 |
|cyan bacillus        |2 |2 | 2|2 |3 |3 | 3|2 |3 | 2 | 2 | 3 |
|green rod   |0 |1 | 0|0 |0 |1 | 3|2 |2 | 3 | 2 | 3 |
|light blue vibrio    |3 |2 | 3|2 |2 |3 | 1|1 |0 | 0 | 0 | 1 |
|yellow flagellated microbe       |0 |0 | 1|1 |0 |1| 2|3 |2 | 3 | 3 | 2 |

From your observations of the cartoon and its data table propose a hypothesis for one way microorganisms might be involved in the disease. **What do you think is going on?**

Here are some observations that I might make looking at this table. For me, it helps to list out each direct observation before trying too hard to interpret them.

- Each of these types of microbe is present in at least one healthy patient and one diseased patient
- One microbe (the cyan-colored bacillus) seems to be present in roughly equal abundance across both diseased and health patients (2-3 counts per sample).
- Despite these similarities there seem to be overall differences in the **community of microbes** in health vs. diseased patients: samples from healthy patients mostly have more purple spirochaetes and light blue vibriois, whereas diseased patients have more of the yellow flagellated microbes and green rods. In the literature you will hear this called a 'change in microbiome composition'.

#### Example: A shift in the proportion of Firmicutes vs. Bacteroidetes in the human gut microbiome during obesity.

The pattern we saw up above in scenario 2 up above is similar to patterns that researchers are seeing in countless studies of the microbiome and disease.

One classic example where this pattern was observed came not from a disease, but rather from the study of microbiome change during weight gain. Dr. Ruth Ley and colleagues used 16S rRNA amplicon sequencing to study the gut microbiome of mice that were predisposed to obesity (*ob*/*ob* mice, where *ob* indicates the presence of an allele or gene variant that predisposes the mouse to obesity) [1]. In those mice, the relative abundance of one whole phylum of microorganisms, the Bacteroidetes, declined by ~50%, while another phylum, the Firmicutes increased in proportion by a similar amount. Later work showed that there were differences in the ratio of Bacteroidetes to Firmicutes in people who were obese as well. When people who were obese went on fat or carbohydrate restricted diets and lost weight, this ratio shifted to be more similar to that typically seen in people who were lean[2].

This real-world observation is similar to the shift in the abundance of different groups of microbes up above.

#### Shifts in microbiome composition with disease have multiple possible explanations

From these observations, we might form several hypotheses, along similar lines to those we considered for Scenario 1:

- **Hypothesis 1**: Perhaps a shift in which microbes are most abundant in the gut of certain people caused them to develop the disease.  A change in microbiome composition that causes disease can be called 'dysbiosis'. In other words, microbiome composition -> disease.
- **Hypothesis 2**: Alternatively, perhaps the disease itself caused a change in which microbes are most abundant, but that change in the microbiology did not contribute to the patients symptoms. In other words disease -> microbiome composition
- **Hypothesis 3**: Some external factor (e.g. exposure to a certain pollutant for example) both caused the disease and altered the microbiome of the diseased patients.

It is worth noting that these hypotheses are not *mutually exclusive*. That means that more than one might be correct at the same time. For example, it is possible for an environmental factor to disturb the microbiome, which in turn can affect the symptoms that a patient experiences, and those symptoms may themselves allow for further changes in the microbiome.

You will probably see that these closely parallel our hypothesis for the red cocci in Scenario 1. Just like in that case, careful experiments and analyses are needed to try to distinguish these possibilities.

If you have studied statistics, you will know that there is one very important hypothesis that we have not yet considered - the *null hypothesis* that there are no real average differences between diseased and healthy patients. Under this hypothesis these two sets of samples only *look* different because we haven't considered enough samples. 

For example, if we flipped a fair coin 6 times and counted heads for the healthy patients, then flipped it 6 more times for the diseased patients we could very easily get different numbers of heads.  

- Null hypothesis: Although each patient has a different microbiome, there is no difference, on average, in the microbial communities of healthy and diseased patients.

Typically, statistical tests are used to figure out the probability of getting an apparent difference in the microbiome composition of healthy and diseased patients as large as the one we observed under the null hypothesis (i.e. due to chance).  We'll talk much more about statistical hypothesis testing later on.

#### Testing whether a change in microbiome composition causes disease using germ-free mice

One way that researchers might experimentally distinguish hypothesis 2 (microbiome -> disease) and hypothesis 3 (disease -> microbiome composition) is by experimentally changing the microbiome, and seeing if disease develops. This is conceptually similar to the idea from Koch's postulates of adding a pathogen to a mouse and seeing if disease reoccurs. For shifts in microbial community composition, however, it is not just a single pathogen that must be added, but rather a whole community of microbes. These types of experiments are often done in *gnotobiotic* animals, which are animals that are grown and raised in bacteria-free enclosures. 

For example, when Ley *et al.* [2] found that an increase in Firmicutes relative to Bacteroidetes was associated with obesity, it was not yet clear whether this shift contributed in any way to obesity, or if it was just a harmless consequence of being obese. To test these possibilities,experiments led by Dr. Pete Turnbaugh [3] transplanted microbial communities from either lean or obese mice into gnotobiotic mice that did not have any microbes of their own. They found that simply adding the microbial community from the obese mice was enough to trigger increased weight gain in otherwise normal mice (i.e. similar to Hypothesis 1 up above). This suggests that changes in the gut microbiome aren't just a consequence of weight gain, but also cause weight gain. 


<img src="./resources/Scenario_3_microbial_overgrowth_no_table-01.png" width="400" align="left"  description="A cartoon of multicolored microbes in 6 healthy people or 6 people with a disease. The picture shows 12 boxes (one per person). In each box there are multiple microbes of different types. A table below will list the counts of microbes in each sample if you cannot view them here."> 

### **Microbial Change Scenario 3**. 

As in scenario 1, above, the figure to the left shows a cartoon of the types of microorganisms found in samples from 12 people (perhaps determined using microbial amplicon sequencing or shotgun metagenomics). 

|Microbe/Sample  |S1|S2|S3|S4|S5|S6|S7|S8|S9|S10|S11|S12|
|----------------|--|--|--|--|--|--|--|--|--|---|---|---|
|purple spirochaete |1 |1 | 3|0 |2 |2 | 4|3 |4 | 3 | 0 | 8 |
|cyan bacillus |3 |2 | 0|3 |1|2 | 2|6 |4 | 9 | 6 | 0 |
|green rod   |1 |1 | 0|2 |1 |0 | 4|3 |0 | 3 | 6 | 0 |
|brown flagellated microbe       |0 |0 | 1|0 |0 |1| 0|0 |2 | 0 | 0 | 6 |



From your observations of the cartoon and its data table propose a hypothesis for one way microorganisms might be involved in the disease. **What do you think is going on?**

#### Changes in overall microbiome abundance

In this case one very clear pattern might emerge quickly - there are simply many more microbes in the diseased patients than in the healthy ones. For example, in [Small Intestinal Bacterial Overgrowth (SIBO)](https://en.wikipedia.org/wiki/Small_intestinal_bacterial_overgrowth) the number of bacteria in the small intestine increases from < 10,000 in health individuals to > 100,000 (10^5) in people with SIBO. This change is associated with a variety of unpleasant symptoms, including nausea and malnutrition.

Interestingly, current DNA sequencing methods don't reliably detect changes in the overall number of microbes in certain samples. This is because DNA extraction and sequencing reactions vary each time in their effectiveness. Therefore the absolute number of DNA sequences that you get back per sample from a DNA sequencing machine varies from sample to sample, even if the same amount of DNA is used as an input. As such, the raw count of how many DNA sequences you get back won't tell you if the *overall count* of bacteria has increaseed. Instead DNA sequencing is useful for telling you the *relative abundance* (percentage) of each type of bacteria in a sample. 

This problem, which is known as *compositionality* has many important implications for statistical anaysis of DNA sequence data, since most commonly-used statistical methods assume that observations are independent (don't depend on one another). Percentages of a total, however, are not independent. If there are just two microbes, A and B, and A goes down, then even if the *count* of B does not change, the *percentage* of the community that is B will increase.

Finding the best methods to detect and work with data that looks like the pattern in this scenario is an active area of research. Some current approaches include: 

- Using a device called a flow cytometer to count the number of bacteria in each sample directly. However, this requires that the sample be processed so that the bacteria can be physically separated from one another and other substances in the sample. As such, it tends to work better for liquid samples (e.g. seawater) and to be more challenging for tissue samples.
- Some researchers using shotgun metagenomics have tried to address the problem of compositionality (see above) by adding in known quantities of artificially synthesized DNA fragments to their samples. These are sometimes called sequins (**sequ**encing spike-**in** control**s**) - not to be confused with the sequins on clothes! Even though each sample will generate more or fewer sequences relative to other samples, the proportion of types of sequences within any given sample should stay the same. By adding sequins at known concentration, it is in principle possible to infer the absolute abundance of all the other input DNA sequences.
- Much current research in computational and statistical analysis of microbiome and other 'omics data is focused on developing better statistical methods for dealing with compositional data (which is inherently non-independent and therefore presents problems for most 'normal' statistical methods like T-tests or ANOVAs).




<img src="./resources/Scenario_4_alpha_diversity_with_table_r2-01.png" width="400" align="left"  description="A cartoon of multicolored microbes in 6 healthy people or 6 people with a disease. The picture shows 12 boxes (one per person). In each box there are multiple microbes of different types. A table below will list the counts of microbes in each sample if you cannot view them here."> 

### **Microbial Change Scenario 4**. 

The figure to the left shows a cartoon of the types of microorganisms found in samples from 12 people (perhaps determined using microbial amplicon sequencing or shotgun metagenomics). 

|Microbe/Sample            |S1|S2|S3|S4|S5|S6|S7|S8|S9|S10|S11|S12|
|--------------------------|--|--|--|--|--|--|--|--|--|---|---|---|
|purple spirochaete        |1 |2 | 2|1 |1 |1 | 2|6 |4 | 0 | 2 | 4 |
|cyan bacillus             |2 |0 | 3|2 |2 |2 |6 |0| 2 | 6 | 4 | 4 |
|teal vibrio               |4 |0 | 0|4 |2 |0 | 0|4 |0 | 0 | 0 | 2 |
|green rod                 |1 |1 | 1|0 |2 |3 | 0|0 |0 | 0 | 0 | 0 |
|brown flagellated microbe  |1 |1 | 1|2 |1 |1 | 0|0 |0 | 0 | 0 | 0 |
|gold flagellated microbe  |1 |1 | 1|1 |1 |1 | 0|0 |0 | 0 | 0 | 0 |
|purple cocci |1 |2| 1|2 |1 |1 | 0|0 |0 | 0 | 0 | 0 |
|red cocci |0 |4| 3|0 |0|2| 2|0 |4 | 4 | 2 | 0 |



From your observations of the cartoon and its data table propose a hypothesis for one way microorganisms might be involved in the disease. **What do you think is going on?**

#### Changes in Microbiome Alpha Diversity

There are several differences between diseased and healthy patients up above. One that may jump out at you is that the diseased patients all have many fewer species of microbes than do the healthy patients. That is, the communities of microbes in the diseased patients are *less diverse* in terms of the species they contain relative to those in the healthy patients. If we count the species present in samples S7-S12, we see that each has 2-3 species. In contrast, samples S1-S6 from healthy patients have representation from 6-7 species.

The difference in microbial community diversity between healthy and diseased patients is analagous to the difference between two grassland communities, one of which has many fewer species.

Ecologists have a special term for this type of change in diversity within each sample: alpha diversity. The term alpha diversity was coined by R.H. Whittaker [4]. Alpha diversity measures the diversity within each sample. The simplest measurement of alpha diversity is simply to count up how many species are in each sample. Note that this assumes all the samples have been equally studied - an assumption that is often violated in DNA sequencing studies and to which we will return later.

Alpha diversity was distinguished conceptually from beta-diversity, defined as the *difference*  or *turnover* in species between pairs of communities. Whittaker's core idea was this: if you wanted to understand the diversity of a whole landscape, you should characterize both the alpha diversity of individual plots - this one has 6 species of grasses, that one has 7, etc - and the turnover of species *between* plots - this plot shares 5/7 species with that one. By combining these values one could estimate the overall landscape or gamma diversity.

<img src="./resources/richness.png" width="400"  description="A cartoon of two microbial communities, one of which has many more morphologies of microbe than the other and is labelled 'High Richness'">

### Two flavors of alpha diversity - richness and evenness

Alpha diversity can be further divided into two flavors - richness and evenness.

 
**Richness** has to do with the overall diversity of species in a community. In contrast, eveness asks whether species occur in roughly equal proportions overall (high evenness) or in highly unequal proportions (low evenness).










<img src="./resources/richness_vs_eveness.png" width="300"  description="A cartoon of two pairs of microbial communities. The first pair has only 2 species. Within it, one sample labelled 'High Evenness' has microbes that are roughly equal in count between the two species. In the other, labelled 'Low Evenness', the microbes are very different in count (one has many counts, the other has few counts). A second pair of samples shows the same concept, but for a pair of samples with many more species. This illustrates that richness and evenness are not the same thing, and a sample with high or low richness can have high or low evenness."> 

**Evenness** has to do with how counts of species are distributed across species. A community in which all species are in roughly similar abundance is considered 'high evenness', whereas a community in which some species are very abundant while others are rare is considered 'low evenness'.


The reasons that richness and evenness are interesting may be clearer if we consider an economic example. When comparing countries we might want to consider some measure of the overall size of their economy (e.g. Gross Domestic Product or GDP) and income inequality (e.g. as measured by the Gini Index). Both large and small economies could be equal or unequal in their distribution of income. In this analogy, species richness would be similar to GDP while evenness would be more similar to income inequality.

An example of microbial communities with high and low richness and evenness is shown to the left.

<img src="./resources/Scenario_5_beta_diversity_no_table-01.png" width="400" align="left"  description="A cartoon of multicolored microbes in 6 healthy people or 6 people with a disease. The picture shows 12 boxes (one per person). In each box there are multiple microbes of different types. A table below will list the counts of microbes in each sample if you cannot view them here."> 
### **Microbial Change Scenario 5**. 

The figure to the left shows a cartoon of the types of microorganisms found in samples from 12 people (perhaps determined using microbial amplicon sequencing or shotgun metagenomics). 

|Microbe/Sample             |S1|S2|S3|S4|S5|S6|S7|S8|S9|S10|S11|S12|
|---------------------------|--|--|--|--|--|--|--|--|--|---|---|---|
|purple flagellated microbe |1 |1 | 1|1 |1 |1 | 6|0 |0 | 6 | 3 | 0 |
|navy flagellated microbe   |1 |1 | 1|0 |1 |2 | 0|0 |7 | 0 | 1 | 0 |
|gold flagellated microbe   |1 |1 | 1|2 |1 |1 | 0|7 |0 | 0 | 0 | 7 |
|teal cocci                 |6 |7 | 7|7 |7 |7 | 4|3 |3 | 4 | 6 | 3 |



From your observations of the cartoon and its data table propose a hypothesis for one way microorganisms might be involved in the disease. **What do you think is going on?**

#### Beta-diversity - turnover between communities

Among other changes, you might notice that the communities in the healthy patients are much more *consistent* than those in the diseased patients. While the healthy patients all have relatively similar microbial communities, the diseased patients each have quite different microbial communities from one another. This type of pattern is often measured using beta diversity.

[Beta diversity](https://en.wikipedia.org/wiki/Beta_diversity) is a measure of microbial community diversity that measures how different a given pair of ecological communities are. For example, if one were surveying the plant communities in different habitats, you might find that two forested areas had low beta diversity relative to one another, but each has high beta diversity relative to a grassland community. This is equivalent to saying 'the two forest communities have similar plant species, whereas the grassland community has very different ones'. An important feature of beta diversity is that, unlike alpha diversity, it **can only be calculated for a pair of samples**. It makes no sense to say that the plants in a forest are very 'different from' without finishing the sentence to say *what* they are different from.

When microbial ecologists study microbial communities, they often represent beta diversity in 2D or 3D scatter plots called [Principal Coordinates Analysis](https://en.wikipedia.org/wiki/Multidimensional_scaling#Types) or PCoA plots (a 100% equivalent name for these plots that you may see in the literature is Multidimensional Scaling or MDS plots). In these plots, pairs of samples with low beta diversity appear closer together, while pairs of samples with higher beta diversity appear farther apart. 

Flip back to the cartoon for scenario 2. In that scenario, we saw that all healthy patients looked similar to other healthy patients, all diseased patients looked similar to all diseased patients, but healthy and diseased patients looked different from one another because some microbes were always more abundant in healthy patients, and some microbes were always more abundant in patients with the disease. A formal way to say this is that there is low intra-class beta-diversity (e.g. healthy vs. healthy), but high inter-class beta-diversity (e.g. healthy vs. diseased). That kind of pattern produces **clustering** in PCoA plots. It is the main pattern microbial ecologists have tended to look for when trying to see if microbial community change is associated with a particular condition or disease.

In contrast, the scenario we see in scenario 5 is a little different. As before, each of the microbiomes from healthy patients is similar to other healthy patients, but - unlike scenario 2 - here each diseased patient has different microbes from each other diseased patient. A more formal way to say this is that disease is associated with greater inter-individual *variation* in beta-diversity. 



## [Reading Responses and Feedback](https://docs.google.com/forms/d/e/1FAIpQLSeUQPI_JbyKcX1juAFLt5z1CLzC2vTqaCYySUAYCNElNwZqqQ/viewform?usp=pp_url&entry.2118603224=An+Introduction+to+'Omics)



## References

- [1] Ley, R. E. et al. Proc. Natl Acad. Sci. USA 102, 11070–11075 (2005)
- [2] Ley, R.E. et al. Nature 444, 1022-1023 (2006) 
- [3] Turnbaugh, P. J. et al. Nature 444,1027–1031 (2006)
- [4] Whittaker, R. H. "Vegetation of the Siskiyou Mountains, Oregon and California." Ecological Monographs, 30, 279–338. doi:10.2307/1943563 (1960) 