taxdiv
Taxonomic Diversity Indices Using Deng Entropy
Calculates taxonomic diversity indices for ecological community data using Deng entropy framework and classical approaches (Shannon, Simpson, Clarke & Warwick). Provides functions for computing taxonomic distinctness, average taxonomic distinctness (AvTD/Delta+), variation in taxonomic distinctness (VarTD/Lambda+), and Deng entropy-based measures that incorporate taxonomic hierarchy information. Includes tools for constructing taxonomic trees and computing pairwise taxonomic distances.
README
# taxdiv Example Outputs and Methods Guide
This folder contains example outputs generated by the `taxdiv` package and
explains what each method measures. A hypothetical Mediterranean forest
community with 10 tree species is used. The taxonomic hierarchy has 7 levels:
**Species > Genus > Family > Order > Class > Phylum > Kingdom**
---
## Table of Contents
1. [Example Communities](#example-communities)
2. [Shannon Index](#1-shannon-index)
3. [Simpson Index](#2-simpson-index)
4. [Pielou Evenness Index](#3-pielou-evenness-index)
5. [Taxonomic Distance Matrix](#4-taxonomic-distance-matrix)
6. [Clarke & Warwick Indices](#5-clarke--warwick-indices)
7. [Deng Entropy](#6-deng-entropy)
8. [Ozkan pTO](#7-ozkan-pto-taxonomic-diversity)
9. [Rarefaction Curve](#8-rarefaction-curve)
10. [Visualizations](#9-visualizations)
11. [References](#references)
---
## Example Communities
Two different scenarios are used in the examples. The goal is to show how
indices respond when the same species are present but abundance distributions
change.
### "Diverse" (Balanced Distribution)
All species have similar abundances. No single species is dominant.
```
Pinus_brutia = 30, Quercus_coccifera = 25, Fagus_orientalis = 20,
Quercus_infectoria = 18, Cedrus_libani = 15, Pinus_nigra = 12,
Carpinus_betulus = 10, Juniperus_excelsa = 8, Abies_cilicica = 7,
Juniperus_oxycedrus = 5
```
### "Dominant" (Unequal Distribution)
One species (Quercus coccifera = 80 individuals) dominates all others.
The rest have between 1 and 5 individuals.
```
Quercus_coccifera = 80, Quercus_infectoria = 5, Pinus_brutia = 3,
Cedrus_libani = 3, Pinus_nigra = 2, Juniperus_excelsa = 2,
Fagus_orientalis = 2, Juniperus_oxycedrus = 1, Abies_cilicica = 1,
Carpinus_betulus = 1
```
> **Note:** "Diverse" and "Dominant" are not formal ecological terms.
> They are labels describing the evenness of the abundance distribution.
---
## 1. Shannon Index
**Function:** `shannon(community)`
**What does it measure?** The uncertainty in a community. If you randomly
pick an individual, how difficult is it to predict its species?
**Formula:**
```
H' = -sum(p_i * ln(p_i))
```
`p_i` = proportional abundance of species i (abundance of species i / total abundance)
**How to interpret:**
- H' = 0: Only one species exists. Prediction is trivial.
- High H': Many species with balanced abundances. Prediction is difficult = high diversity.
**Example:**
- 3 species with equal abundance: H' = ln(3) = 1.099
- 10 species with equal abundance: H' = ln(10) = 2.303
- 10 species but one dominant: H' < 2.303 (dominant species reduces uncertainty)
**Reference:** Shannon, C.E. (1948). A mathematical theory of communication.
*Bell System Technical Journal*, 27, 379-423.
---
## 2. Simpson Index
**Function:** `simpson(community)`
**What does it measure?** The probability that two randomly selected
individuals belong to different species.
**Formula (Gini-Simpson):**
```
1 - D = 1 - sum(p_i^2)
```
**How to interpret:**
- Close to 0: One species is everywhere. Two random individuals are likely the same species.
- Close to 1: Species are evenly distributed. Two random individuals are likely different species.
**How does it differ from Shannon?** Simpson is more sensitive to dominant
species. Shannon gives more weight to rare species. Both are typically
used together.
**Reference:** Simpson, E.H. (1949). Measurement of diversity.
*Nature*, 163, 688.
---
## 3. Pielou Evenness Index
**Function:** No dedicated function, but can be calculated using `shannon()`.
**What does it measure?** How evenly abundances are distributed across species.
**Formula:**
```
J = H' / ln(S)
```
`H'` = Shannon index, `S` = number of species
**How to interpret:**
- J = 1: All species have equal abundance. Perfect evenness.
- J = 0: A single species completely dominates the community.
- J = 0.8: A well-balanced community.
- J = 0.3: Significant dominance present.
**Reference:** Pielou, E.C. (1966). The measurement of diversity in
different types of biological collections. *Journal of Theoretical
Biology*, 13, 131-144.
---
## 4. Taxonomic Distance Matrix
**Function:** `tax_distance_matrix(tax_tree)`
**What does it measure?** The taxonomic distance between two species.
In other words, "how different are these two species taxonomically?"
**How is it calculated?** The level at which two species first converge
in the taxonomic tree is identified:
```
Example: Pinus nigra and Quercus cerris
- Same genus? No (Pinus vs Quercus) -> 1 step
- Same family? No (Pinaceae vs Fagaceae) -> 2 steps
- Same order? No (Pinales vs Fagales) -> 3 steps
- Same class? No (Pinopsida vs Magnoliopsida) -> 4 steps
- Same phylum? No (Pinophyta vs Magnoliophyta) -> 5 steps
- Same kingdom? Yes (Plantae) -> Distance = 5
```
```
Example: Pinus nigra and Abies nordmanniana
- Same genus? No -> 1 step
- Same family? Yes (Pinaceae) -> Distance = 1
```
The higher the level at which they converge, the greater the taxonomic distance.
**Related figure:** `heatmap.png`
**Reference:** Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness
index and its statistical properties. *Journal of Applied Ecology*,
35, 523-531.
---
## 5. Clarke & Warwick Indices
### 5a. Delta (Average Taxonomic Distance)
**Function:** `delta(community, tax_tree)`
**What does it measure?** The expected taxonomic distance between two
randomly selected individuals from the community. Uses abundance data.
**Formula:**
```
Delta = [sum_i sum_j (w_ij * n_i * n_j)] / [N * (N-1) / 2]
```
`w_ij` = taxonomic distance between species i and j,
`n_i` = abundance of species i, `N` = total abundance
**Interpretation:** High Delta = species are taxonomically distant from
each other AND abundances are balanced.
### 5b. Delta* (Taxonomic Distance - Abundance Normalized)
**Function:** `delta_star(community, tax_tree)`
**What does it measure?** Same as Delta but attempts to separate
taxonomic structure from the effect of abundance distribution.
### 5c. AvTD / Delta+ (Average Taxonomic Distinctness)
**Function:** `avtd(species_names, tax_tree)`
**What does it measure?** Average taxonomic distance based solely on
species lists (presence/absence). Does NOT use abundance data.
**Why is it important?** You can compare sites with different sample
sizes, because it is abundance-independent.
**Formula:**
```
Delta+ = [sum_i<j w_ij] / [S * (S-1) / 2]
```
`S` = number of species, `w_ij` = taxonomic distance between species
**Interpretation:** High AvTD = species are taxonomically distant from
each other. Low AvTD = species are closely related (e.g., all from the
same family).
### 5d. VarTD / Lambda+ (Variation in Taxonomic Distinctness)
**Function:** `vartd(species_names, tax_tree)`
**What does it measure?** How variable the taxonomic distances are.
Are some species pairs very close while others are very distant?
**Interpretation:** High VarTD = uneven taxonomic structure (some
species are very close, others very distant). Low VarTD = homogeneous
distances between species.
**Reference (5a-5d):** Clarke, K.R. & Warwick, R.M. (1998). A
taxonomic distinctness index and its statistical properties. *Journal
of Applied Ecology*, 35, 523-531.
Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index
applicable to species lists: variation in taxonomic distinctness.
*Marine Ecology Progress Series*, 216, 265-278.
---
## 6. Deng Entropy
**Function:** `deng_entropy_level(counts, group_sizes)`
**What does it measure?** Classic Shannon entropy treats each species
as an independent event. Deng entropy considers that species belong to
groups, and groups belong to higher groups. In other words, it accounts
for the taxonomic hierarchy.
**Key idea:** In Shannon entropy, each event (species) is atomic.
In Deng entropy, events (species) have set structure. It operates
within the Dempster-Shafer evidence theory framework.
**Formula:**
```
Ed = -sum(m_i * ln(m_i / (2^|F_i| - 1)))
```
`m_i` = proportional weight of species in the group,
`|F_i|` = number of species in the group
**Interpretation:** If one family has 5 species and another has 1 species,
Deng entropy accounts for this set structure. Shannon entropy cannot
distinguish this.
**Reference:** Deng, Y. (2016). Deng entropy. *Chaos, Solitons & Fractals*,
91, 549-553.
---
## 7. Ozkan pTO (Taxonomic Diversity)
**Function:** `ozkan_pto(community, tax_tree)` and `pto_components(community, tax_tree)`
**What does it measure?** Uses Deng entropy to jointly measure taxonomic
diversity and taxonomic distance. Produces 4 components.
### 4 Components:
| Component | Uses abundance? | What does it measure? |
|-----------|-----------------|----------------------|
| **uTO** | Yes (Run 1+2+3) | Unweighted taxonomic diversity |
| **TO** | Yes (Run 1+2+3) | Weighted taxonomic diversity |
| **uTO+** | No (Run 1 only) | Unweighted taxonomic distance |
| **TO+** | No (Run 1 only) | Weighted taxonomic distance |
### Run 1 (Deterministic):
All species are included with equal weight. Deng entropy is calculated
at each taxonomic level. Result: baseline value.
### Run 2 (Stochastic Resampling):
Each species is included or excluded with 50% probability. This is
repeated 101+ times. pTO is recalculated each time. Maximum values
are taken.
**Function:** `ozkan_pto_resample(community, tax_tree, n_iter = 101)`
### Run 3 (Sensitivity Analysis):
Uses the maximum values from Run 2 as reference, and tests stability
with a 5% sensitivity threshold. Counts how many iterations are close
to the reference value.
**Function:** `ozkan_pto_sensitivity(community, tax_tree, n_iter = 101)`
### Why is TO+ based on Run 1 only?
Taxonomic distance (TO+) measures how far apart species are from each
other. This is unrelated to abundance. A species with 5 individuals and
one with 500 individuals are still in the same family. Therefore, it does
not require the slicing procedure (Run 2/3).
Taxonomic diversity (TO) must also account for abundance. The slicing
procedure ensures that abundant species survive more slicing steps,
thereby indirectly incorporating abundance into the system.
**Reference:** Ozkan, K. (2018). A new equation proposed for measuring
taxonomic diversity. *Turkish Journal of Forestry*, 19(4), 336-346.
DOI: 10.18182/tjf.441061
---
## 8. Rarefaction Curve
**Function:** `rarefaction_taxonomic(community, tax_tree, index = "shannon")`
**What does it measure?** It answers the question: "Is my sample size sufficient?"
### The Problem:
Suppose two researchers are working at different sites:
- Site A: 500 individuals counted, 30 species found
- Site B: 100 individuals counted, 15 species found
Is Site A more diverse? Perhaps. But maybe more species were found simply
because more individuals were counted. To make a fair comparison, one must
"rarefy" to the same sample size.
### How does it work?
1. From the total N individuals, randomly select n (where n < N)
2. Count species abundances among the selected individuals
3. Calculate the chosen diversity index
4. Repeat this 100-200 times and take the average
5. Repeat for different values of n (e.g., 10, 20, 50, 100, ...)
The result is a curve:
```
Diversity
| ___________ <-- curve plateaus = sampling is sufficient
| /
| /
| / <-- rapid increase = still insufficient
| /
| /
|--/
+--------------------- Sample Size
```
### What is the confidence interval (CI)?
The grey/blue band around the curve. It means "the true value is within
this range with 95% probability." A narrow band = reliable result.
A wide band = uncertain.
### Supported indices:
| index | Description |
|-------|-------------|
| `"species"` | Species richness (S) |
| `"shannon"` | Shannon H' |
| `"simpson"` | Gini-Simpson (1-D) |
| `"uTO"` | Ozkan unweighted taxonomic diversity |
| `"TO"` | Ozkan weighted taxonomic diversity |
| `"uTO_plus"` | Ozkan unweighted taxonomic distance |
| `"TO_plus"` | Ozkan weighted taxonomic distance |
| `"avtd"` | Clarke & Warwick AvTD |
**Visualization:** `plot_rarefaction(rare_result)`
**Related figures:**
- `rarefaction_shannon.png` - Shannon rarefaction curve
- `rarefaction_species.png` - Species richness rarefaction curve
- `rarefaction_uTO.png` - uTO rarefaction curve
- `rarefaction_avtd.png` - AvTD rarefaction curve
**References:**
Gotelli, N.J. & Colwell, R.K. (2001). Quantifying biodiversity:
procedures and pitfalls in the measurement and comparison of species
richness. *Ecology Letters*, 4, 379-391.
Hurlbert, S.H. (1971). The nonconcept of species diversity: a critique
and alternative parameters. *Ecology*, 52, 577-586.
Sanders, H.L. (1968). Marine benthic diversity: a comparative study.
*The American Naturalist*, 102, 243-282.
---
## 9. Visualizations
### 9a. Dendrogram (Taxonomic Tree)
**Function:** `plot_taxonomic_tree(tax_tree, community)`
Displays species as a tree based on their taxonomic distances.
Closely related species appear on nearby branches, distant species on
far branches. Coloring is based on taxonomic level (family, class, etc.).
**Related figures:**
- `dendrogram_family.png` - Colored by family
- `dendrogram_class.png` - Colored by class
- `dendrogram_no_abundance.png` - Without abundance, colored by order
### 9b. Heatmap
**Function:** `plot_heatmap(tax_tree)`
Displays the taxonomic distance matrix as colored tiles.
Dark colors = two species are closely related. Light colors = distant.
Species are ordered by hierarchical clustering, so similar species
appear side by side.
**Related figure:** `heatmap.png`
### 9c. Comparison Chart
**Function:** `compare_indices(communities, tax_tree, plot = TRUE)`
Compares multiple communities across all indices. Displayed as a
grouped bar plot.
**Key observation:** Abundance-dependent indices (Shannon, Simpson, Delta,
uTO, TO) differ between Diverse and Dominant, while abundance-independent
indices (AvTD, VarTD, uTO+, TO+) remain the same. This is because the
latter group only considers the species list, not abundance.
**Related figure:** `compare_indices_barplot.png`
### 9d. Iteration Plot (Run 2/3)
**Function:** `plot_iteration(resample_result)`
Shows the pTO value produced by each iteration in Run 2.
Red dashed line = Run 1 (deterministic) result.
Blue dashed line = maximum value.
**Related figures:**
- `iteration_run2_TO.png` - TO component iterations
- `iteration_run2_uTO_plus.png` - uTO+ component iterations
### 9e. Bubble Chart
**Function:** `plot_bubble(community, tax_tree)`
Each species is a bubble. X-axis = abundance, Y-axis = average taxonomic
distance, bubble size = species contribution to the community (abundance x distance).
Large bubbles in the upper right corner = species that are both abundant
and taxonomically distinct. These are the "important" species in the community.
**Related figure:** `bubble_family.png`
### 9f. Radar (Spider) Chart
**Function:** `plot_radar(communities, tax_tree)`
Compares multiple communities across all indices in a spider web format.
Each axis represents an index. Higher values extend outward. The overall
profile of communities can be seen at a glance.
**Related figure:** `radar_comparison.png`
### 9g. Rarefaction Plot
**Function:** `plot_rarefaction(rare_result)`
Displays the rarefaction curve with confidence intervals.
Red dashed line = total sample size (N).
Blue band = 95% confidence interval.
**Related figures:**
- `rarefaction_shannon.png`
- `rarefaction_species.png`
- `rarefaction_uTO.png`
- `rarefaction_avtd.png`
---
## File List
| File | Function | Description |
|------|----------|-------------|
| `dendrogram_family.png` | `plot_taxonomic_tree()` | Dendrogram colored by family |
| `dendrogram_class.png` | `plot_taxonomic_tree()` | Dendrogram colored by class |
| `dendrogram_no_abundance.png` | `plot_taxonomic_tree()` | Dendrogram without abundance |
| `compare_indices_barplot.png` | `compare_indices()` | 10-index comparison chart |
| `heatmap.png` | `plot_heatmap()` | Taxonomic distance heatmap |
| `iteration_run2_TO.png` | `plot_iteration()` | TO iteration plot |
| `iteration_run2_uTO_plus.png` | `plot_iteration()` | uTO+ iteration plot |
| `bubble_family.png` | `plot_bubble()` | Bubble chart |
| `radar_comparison.png` | `plot_radar()` | Radar (spider) chart |
| `rarefaction_shannon.png` | `plot_rarefaction()` | Shannon rarefaction curve |
| `rarefaction_species.png` | `plot_rarefaction()` | Species richness rarefaction curve |
| `rarefaction_uTO.png` | `plot_rarefaction()` | uTO rarefaction curve |
| `rarefaction_avtd.png` | `plot_rarefaction()` | AvTD rarefaction curve |
---
## References
### Classical Diversity Indices
- Shannon, C.E. (1948). A mathematical theory of communication.
*Bell System Technical Journal*, 27, 379-423.
- Simpson, E.H. (1949). Measurement of diversity. *Nature*, 163, 688.
- Pielou, E.C. (1966). The measurement of diversity in different types
of biological collections. *Journal of Theoretical Biology*, 13, 131-144.
### Taxonomic Distinctness (Clarke & Warwick)
- Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index
and its statistical properties. *Journal of Applied Ecology*, 35, 523-531.
- Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index
applicable to species lists: variation in taxonomic distinctness.
*Marine Ecology Progress Series*, 216, 265-278.
### Deng Entropy and Dempster-Shafer Theory
- Deng, Y. (2016). Deng entropy. *Chaos, Solitons & Fractals*, 91, 549-553.
- Dempster, A.P. (1967). Upper and lower probabilities induced by a
multivalued mapping. *The Annals of Mathematical Statistics*, 38, 325-339.
- Shafer, G. (1976). *A Mathematical Theory of Evidence*. Princeton
University Press.
### Ozkan pTO Index
- Ozkan, K. (2018). A new equation proposed for measuring taxonomic
diversity. *Turkish Journal of Forestry*, 19(4), 336-346.
DOI: 10.18182/tjf.441061
### Rarefaction
- Gotelli, N.J. & Colwell, R.K. (2001). Quantifying biodiversity:
procedures and pitfalls in the measurement and comparison of species
richness. *Ecology Letters*, 4, 379-391.
- Hurlbert, S.H. (1971). The nonconcept of species diversity: a critique
and alternative parameters. *Ecology*, 52, 577-586.
- Sanders, H.L. (1968). Marine benthic diversity: a comparative study.
*The American Naturalist*, 102, 243-282.
### General Ecology and Biodiversity
- Magurran, A.E. (2004). *Measuring Biological Diversity*. Blackwell
Publishing, Oxford.
- Whittaker, R.H. (1972). Evolution and measurement of species diversity.
*Taxon*, 21, 213-251.
Versions across snapshots
| Version | Repository | File | Size |
|---|---|---|---|
0.1.0 |
rolling linux/jammy R-4.5 | taxdiv_0.1.0.tar.gz |
2.7 MiB |
0.1.0 |
rolling linux/noble R-4.5 | taxdiv_0.1.0.tar.gz |
2.7 MiB |
0.1.0 |
rolling source/ R- | taxdiv_0.1.0.tar.gz |
2.6 MiB |
0.1.0 |
latest linux/jammy R-4.5 | taxdiv_0.1.0.tar.gz |
2.7 MiB |
0.1.0 |
latest linux/noble R-4.5 | taxdiv_0.1.0.tar.gz |
2.7 MiB |
0.1.0 |
latest source/ R- | taxdiv_0.1.0.tar.gz |
2.6 MiB |
0.1.0 |
2026-04-26 source/ R- | taxdiv_0.1.0.tar.gz |
2.6 MiB |
0.1.0 |
2026-04-23 source/ R- | taxdiv_0.1.0.tar.gz |
2.6 MiB |
0.1.0 |
2026-04-09 windows/windows R-4.5 | taxdiv_0.1.0.zip |
2.7 MiB |