Skip to content

Ordination: NMDS and PCoA

Ordination arranges samples in low-dimensional space so that their positions reflect their multivariate similarities. Samples that are ecologically or compositionally similar end up close together; dissimilar samples end up far apart.

The key difference from PCA is that ordination methods can work with any distance metric, not just Euclidean. This makes them the standard approach for ecological and microbiome data, where distances such as Bray-Curtis or UniFrac are more appropriate than Euclidean distance.


NMDS

Non-metric multidimensional scaling (NMDS) is the most widely used ordination method in ecology and microbiome research. Rather than preserving the exact distances between samples, it preserves their rank order: if sample A is more similar to B than to C in the original data, then A should be closer to B than to C in the plot.

This rank-based approach makes NMDS flexible and robust. It imposes no distributional assumptions and works with any distance metric.

Stress

Stress measures how well the 2D arrangement reflects the original ranked distances. It is the key diagnostic for NMDS quality:

Stress Interpretation
< 0.05 Excellent
0.05 to 0.10 Good
0.10 to 0.20 Acceptable
0.20 to 0.30 Weak, interpret with caution
> 0.30 Poor, do not use

If stress exceeds 0.20, try increasing dimensions to k = 3, or check whether the distance metric and data preparation are appropriate.

Implementation

library(vegan)
library(ggplot2)

data(dune)
data(dune.env)

# Run NMDS with multiple random starts
set.seed(123)
nmds <- metaMDS(dune,
                distance = "bray",
                k = 2,
                trymax = 100,
                autotransform = FALSE)

nmds$stress
#> 0.118 — acceptable

# Shepard diagram: visual check of fit
stressplot(nmds)
# Points should follow the step function closely
# Extract scores and plot
scores_df <- as.data.frame(scores(nmds, display = "sites"))
scores_df$Management <- dune.env$Management

ggplot(scores_df, aes(x = NMDS1, y = NMDS2, colour = Management)) +
  geom_point(size = 3, alpha = 0.8) +
  stat_ellipse(linetype = 2) +
  labs(title = paste0("NMDS (stress = ", round(nmds$stress, 3), ")")) +
  theme_minimal()

Fitting environmental variables

envfit() projects environmental variables onto the ordination as vectors. The direction of each vector shows which end of the gradient corresponds to high values; the length reflects the strength of the correlation.

env_fit <- envfit(nmds, dune.env, permutations = 999)
env_fit
#> R2 and p-value for each variable

# Plot only significant variables
plot(nmds, type = "n")
points(nmds, display = "sites", pch = 19)
plot(env_fit, p.max = 0.05, col = "red")

Important: NMDS axes are arbitrary

Unlike PCA, NMDS axes have no inherent meaning. Their orientation and scale can change between runs. What matters is the relative position of samples, not their absolute coordinates or the direction of any axis.


PCoA

Principal coordinates analysis (PCoA), also called classical MDS, is closer in spirit to PCA: it produces an eigendecomposition so that the axes explain defined proportions of variance, and the result is deterministic. The key advantage over NMDS is that it preserves exact distances rather than just their ranks.

PCoA is the preferred choice when using phylogenetically informed distances such as UniFrac, where exact distance preservation matters.

library(ape)

dist_bray <- vegdist(dune, method = "bray")
pcoa_result <- pcoa(dist_bray)

# Variance explained
pcoa_result$values$Relative_eig[1:4]

# Extract and plot scores
pcoa_scores <- as.data.frame(pcoa_result$vectors[, 1:2])
names(pcoa_scores) <- c("PCo1", "PCo2")
pcoa_scores$Management <- dune.env$Management

var_exp <- round(pcoa_result$values$Relative_eig[1:2] * 100, 1)

ggplot(pcoa_scores, aes(PCo1, PCo2, colour = Management)) +
  geom_point(size = 3) +
  labs(title = "PCoA",
       x = paste0("PCo1 (", var_exp[1], "%)"),
       y = paste0("PCo2 (", var_exp[2], "%)")) +
  theme_minimal()

Choosing Between NMDS and PCoA

Both methods are valid for most ecological and microbiome data. The choice is usually minor in practice — if your data have clear structure, both will show it.

NMDS PCoA
Axes explain defined % variance No Yes
Deterministic result No Yes
Preserves exact distances No Yes
Handles non-Euclidean distances Yes Yes
Best for UniFrac Yes
Default for Bray-Curtis ecology Yes

When in doubt, run both and compare. Strong disagreement between them can indicate sensitivity to the distance metric or to rare samples.


Common Pitfalls

High stress in NMDS. Do not accept stress above 0.20. Try k = 3, more random starts (trymax = 200), or reconsider the distance metric.

Too few random starts. NMDS can converge to local minima. Always use trymax of at least 100.

Wrong distance for data type. Use Bray-Curtis for species counts and OTU tables, not Euclidean distance.

Over-interpreting axis orientation. NMDS axes can flip between runs. Report sample groupings and stress values, not axis directions.


Exercise

Using the dune dataset:

  1. Run NMDS with Bray-Curtis distance and check the stress value
  2. Plot the Shepard diagram and assess fit quality
  3. Colour the ordination plot by Use (land use type from dune.env)
  4. Fit environmental variables and identify which ones are significantly associated with the ordination
Solution
library(vegan)
library(ggplot2)

data(dune)
data(dune.env)

# 1. NMDS
set.seed(42)
nmds <- metaMDS(dune, distance = "bray", k = 2, trymax = 100,
                autotransform = FALSE)
nmds$stress

# 2. Shepard diagram
stressplot(nmds)

# 3. Score plot
scores_df <- as.data.frame(scores(nmds, display = "sites"))
scores_df$Use <- dune.env$Use

ggplot(scores_df, aes(NMDS1, NMDS2, colour = Use)) +
  geom_point(size = 3) +
  stat_ellipse(linetype = 2) +
  labs(title = paste0("NMDS (stress = ", round(nmds$stress, 3), ")")) +
  theme_minimal()

# 4. Environmental fit
env_fit <- envfit(nmds, dune.env, permutations = 999)
print(env_fit)

plot(nmds, type = "n")
points(nmds, display = "sites", pch = 19,
       col = as.numeric(dune.env$Use))
plot(env_fit, p.max = 0.05, col = "red")