Ordination: NMDS and PCoA
Ordination arranges samples in low-dimensional space so that their positions reflect their multivariate similarities. Samples that are ecologically or compositionally similar end up close together; dissimilar samples end up far apart.
The key difference from PCA is that ordination methods can work with any distance metric, not just Euclidean. This makes them the standard approach for ecological and microbiome data, where distances such as Bray-Curtis or UniFrac are more appropriate than Euclidean distance.
NMDS
Non-metric multidimensional scaling (NMDS) is the most widely used ordination method in ecology and microbiome research. Rather than preserving the exact distances between samples, it preserves their rank order: if sample A is more similar to B than to C in the original data, then A should be closer to B than to C in the plot.
This rank-based approach makes NMDS flexible and robust. It imposes no distributional assumptions and works with any distance metric.
Stress
Stress measures how well the 2D arrangement reflects the original ranked distances. It is the key diagnostic for NMDS quality:
| Stress | Interpretation |
|---|---|
| < 0.05 | Excellent |
| 0.05 to 0.10 | Good |
| 0.10 to 0.20 | Acceptable |
| 0.20 to 0.30 | Weak, interpret with caution |
| > 0.30 | Poor, do not use |
If stress exceeds 0.20, try increasing dimensions to k = 3, or check whether the distance metric and data preparation are appropriate.
Implementation
library(vegan)
library(ggplot2)
data(dune)
data(dune.env)
# Run NMDS with multiple random starts
set.seed(123)
nmds <- metaMDS(dune,
distance = "bray",
k = 2,
trymax = 100,
autotransform = FALSE)
nmds$stress
#> 0.118 — acceptable
# Shepard diagram: visual check of fit
stressplot(nmds)
# Points should follow the step function closely
# Extract scores and plot
scores_df <- as.data.frame(scores(nmds, display = "sites"))
scores_df$Management <- dune.env$Management
ggplot(scores_df, aes(x = NMDS1, y = NMDS2, colour = Management)) +
geom_point(size = 3, alpha = 0.8) +
stat_ellipse(linetype = 2) +
labs(title = paste0("NMDS (stress = ", round(nmds$stress, 3), ")")) +
theme_minimal()
Fitting environmental variables
envfit() projects environmental variables onto the ordination as vectors. The direction of each vector shows which end of the gradient corresponds to high values; the length reflects the strength of the correlation.
env_fit <- envfit(nmds, dune.env, permutations = 999)
env_fit
#> R2 and p-value for each variable
# Plot only significant variables
plot(nmds, type = "n")
points(nmds, display = "sites", pch = 19)
plot(env_fit, p.max = 0.05, col = "red")
Important: NMDS axes are arbitrary
Unlike PCA, NMDS axes have no inherent meaning. Their orientation and scale can change between runs. What matters is the relative position of samples, not their absolute coordinates or the direction of any axis.
PCoA
Principal coordinates analysis (PCoA), also called classical MDS, is closer in spirit to PCA: it produces an eigendecomposition so that the axes explain defined proportions of variance, and the result is deterministic. The key advantage over NMDS is that it preserves exact distances rather than just their ranks.
PCoA is the preferred choice when using phylogenetically informed distances such as UniFrac, where exact distance preservation matters.
library(ape)
dist_bray <- vegdist(dune, method = "bray")
pcoa_result <- pcoa(dist_bray)
# Variance explained
pcoa_result$values$Relative_eig[1:4]
# Extract and plot scores
pcoa_scores <- as.data.frame(pcoa_result$vectors[, 1:2])
names(pcoa_scores) <- c("PCo1", "PCo2")
pcoa_scores$Management <- dune.env$Management
var_exp <- round(pcoa_result$values$Relative_eig[1:2] * 100, 1)
ggplot(pcoa_scores, aes(PCo1, PCo2, colour = Management)) +
geom_point(size = 3) +
labs(title = "PCoA",
x = paste0("PCo1 (", var_exp[1], "%)"),
y = paste0("PCo2 (", var_exp[2], "%)")) +
theme_minimal()
Choosing Between NMDS and PCoA
Both methods are valid for most ecological and microbiome data. The choice is usually minor in practice — if your data have clear structure, both will show it.
| NMDS | PCoA | |
|---|---|---|
| Axes explain defined % variance | No | Yes |
| Deterministic result | No | Yes |
| Preserves exact distances | No | Yes |
| Handles non-Euclidean distances | Yes | Yes |
| Best for UniFrac | Yes | |
| Default for Bray-Curtis ecology | Yes |
When in doubt, run both and compare. Strong disagreement between them can indicate sensitivity to the distance metric or to rare samples.
Common Pitfalls
High stress in NMDS. Do not accept stress above 0.20. Try k = 3, more random starts (trymax = 200), or reconsider the distance metric.
Too few random starts. NMDS can converge to local minima. Always use trymax of at least 100.
Wrong distance for data type. Use Bray-Curtis for species counts and OTU tables, not Euclidean distance.
Over-interpreting axis orientation. NMDS axes can flip between runs. Report sample groupings and stress values, not axis directions.
Exercise
Using the dune dataset:
- Run NMDS with Bray-Curtis distance and check the stress value
- Plot the Shepard diagram and assess fit quality
- Colour the ordination plot by
Use(land use type fromdune.env) - Fit environmental variables and identify which ones are significantly associated with the ordination
Solution
library(vegan)
library(ggplot2)
data(dune)
data(dune.env)
# 1. NMDS
set.seed(42)
nmds <- metaMDS(dune, distance = "bray", k = 2, trymax = 100,
autotransform = FALSE)
nmds$stress
# 2. Shepard diagram
stressplot(nmds)
# 3. Score plot
scores_df <- as.data.frame(scores(nmds, display = "sites"))
scores_df$Use <- dune.env$Use
ggplot(scores_df, aes(NMDS1, NMDS2, colour = Use)) +
geom_point(size = 3) +
stat_ellipse(linetype = 2) +
labs(title = paste0("NMDS (stress = ", round(nmds$stress, 3), ")")) +
theme_minimal()
# 4. Environmental fit
env_fit <- envfit(nmds, dune.env, permutations = 999)
print(env_fit)
plot(nmds, type = "n")
points(nmds, display = "sites", pch = 19,
col = as.numeric(dune.env$Use))
plot(env_fit, p.max = 0.05, col = "red")