Skip to content

From Two to Many: Multivariate Statistics

▫︎ Midi-Key

Extended Sequential Dichotomous Key for Multivariate Statistics

Follow each step in order. Each step either leads to a method or the next step.

1. Is there a defined dependent (outcome) variable?

Yes → go to Step 2
No / exploratory analysis → go to Step 6

2. Type of dependent variable

Categorical → go to Step 3
Continuous → go to Step 4
Multiple continuous dependent variables → go to Step 5

3. Categorical dependent variable

3a. Number of outcome categories

Two categories → go to Step 3b
More than two categories → go to Step 3c

3b. Two categories, predictor type

Predictors continuous only → Linear Discriminant Analysis (LDA)
Predictors continuous and/or categorical → Logistic Regression

3c. More than two categories

Ordered categories → Ordinal Logistic Regression
Unordered categories → Multinomial Logistic Regression

4. Continuous dependent variable

Predictors continuous and/or categorical → Multiple Linear Regression
Predictors categorical only → ANOVA / ANCOVA

5. Multiple continuous dependent variables

Predictors present → MANOVA / MANCOVA
No predictors → go to Step 6

6. Exploratory analysis (no dependent variable)

Goal: Reduce dimensionality / find structure → go to Step 7
Goal: Group observations (clustering) → go to Step 10
Goal: Relate two sets of variables → go to Step 13

7. Dimension reduction (linear / Euclidean)

Variables continuous and Euclidean distance makes sense → PCA
Distance/dissimilarity matrix required → go to Step 8
Latent constructs / model-based factors → Factor Analysis

8. Distance-based ordination

Non-metric, rank-based, iterative → NMDS
Metric / eigen decomposition → PCoA

9. Constrained ordination (include explanatory variables)

Response variables continuous, linear response → RDA (constrained PCA)
Response variables unimodal → CCA (constrained CA)
Distance-based constraints → dbRDA (constrained PCoA)

10. Clustering / grouping

Number of clusters known in advance → k-Means Clustering
Number of clusters unknown → go to Step 11

11. Unknown-cluster structure

Hierarchical structure desired → Hierarchical Clustering
Density-based structure desired → DBSCAN

12. (Reserved for future branching)

(Currently empty to maintain sequential numbering)

13. Relating two sets of variables

Both sets continuous → Canonical Correlation Analysis (CCA)
One set categorical, one set continuous → Discriminant Analysis
Distance-based relationships → dbRDA

Notes

LDA: supervised classification of a categorical outcome using continuous predictors
PCA: linear, Euclidean, continuous variables
PCoA: eigen decomposition of a distance/dissimilarity matrix
NMDS: non-metric, iterative, rank-preserving
RDA: constrained linear ordination
CCA: constrained unimodal ordination
dbRDA: constrained distance-based ordination
Step numbers are sequential > no gaps
Always check assumptions before applying methods