Extended Sequential Dichotomous Key for Multivariate Statistics
Follow each step in order. Each step either leads to a method or the next step.
1. Is there a defined dependent (outcome) variable?
2. Type of dependent variable
- Categorical → go to Step 3
- Continuous → go to Step 4
- Multiple continuous dependent variables → go to Step 5
3. Categorical dependent variable
3a. Number of outcome categories
3b. Two categories, predictor type
- Predictors continuous only → Linear Discriminant Analysis (LDA)
- Predictors continuous and/or categorical → Logistic Regression
3c. More than two categories
- Ordered categories → Ordinal Logistic Regression
- Unordered categories → Multinomial Logistic Regression
4. Continuous dependent variable
- Predictors continuous and/or categorical → Multiple Linear Regression
- Predictors categorical only → ANOVA / ANCOVA
5. Multiple continuous dependent variables
- Predictors present → MANOVA / MANCOVA
- No predictors → go to Step 6
6. Exploratory analysis (no dependent variable)
- Goal: Reduce dimensionality / find structure → go to Step 7
- Goal: Group observations (clustering) → go to Step 10
- Goal: Relate two sets of variables → go to Step 13
7. Dimension reduction (linear / Euclidean)
- Variables continuous and Euclidean distance makes sense → PCA
- Distance/dissimilarity matrix required → go to Step 8
- Latent constructs / model-based factors → Factor Analysis
8. Distance-based ordination
- Non-metric, rank-based, iterative → NMDS
- Metric / eigen decomposition → PCoA
9. Constrained ordination (include explanatory variables)
- Response variables continuous, linear response → RDA (constrained PCA)
- Response variables unimodal → CCA (constrained CA)
- Distance-based constraints → dbRDA (constrained PCoA)
10. Clustering / grouping
- Number of clusters known in advance → k-Means Clustering
- Number of clusters unknown → go to Step 11
11. Unknown-cluster structure
- Hierarchical structure desired → Hierarchical Clustering
- Density-based structure desired → DBSCAN
12. (Reserved for future branching)
- (Currently empty to maintain sequential numbering)
13. Relating two sets of variables
- Both sets continuous → Canonical Correlation Analysis (CCA)
- One set categorical, one set continuous → Discriminant Analysis
- Distance-based relationships → dbRDA
Notes
- LDA: supervised classification of a categorical outcome using continuous predictors
- PCA: linear, Euclidean, continuous variables
- PCoA: eigen decomposition of a distance/dissimilarity matrix
- NMDS: non-metric, iterative, rank-preserving
- RDA: constrained linear ordination
- CCA: constrained unimodal ordination
- dbRDA: constrained distance-based ordination
- Step numbers are sequential > no gaps
- Always check assumptions before applying methods