BioUtils
Biological Data Analysis and Visualization
Provides tools for the analysis and visualization of gene expression data from the NCBI Gene Expression Omnibus (GEO). Implements a complete workflow including data import, quality control, differential expression analysis, co-expression network analysis, pathway enrichment, and multi-gene biomarker discovery. Differential expression uses the empirical Bayes moderated t-statistic of Smyth (2004) <doi:10.2202/1544-6115.1027>. Gene set enrichment analysis follows Subramanian et al. (2005) <doi:10.1073/pnas.0506580102>. Multi-gene biomarker selection uses the LASSO method of Tibshirani (1996) <doi:10.1111/j.2517-6161.1996.tb02080.x>. Effect sizes are computed as Cohen's d following Cohen (1988).
README
# BioUtils
**BioUtils** is an end-to-end R toolkit for analyzing gene expression data from GEO datasets.
It provides a unified workflow for differential expression, statistical testing, visualization, and biological interpretation.
---
## Features
* Load and preprocess GEO datasets
* Differential expression analysis using limma
* Visualization (PCA, volcano plots, gene-level plots)
* Statistical testing (adaptive t-test, effect size, bootstrapped CI)
* Gene set enrichment analysis (GSEA)
* Machine learning (LASSO biomarker selection)
* Gene co-expression analysis
---
## Installation
```r
install.packages("remotes")
remotes::install_github("spencertreadway/BioUtils")
```
---
## Example Workflow
```r
# Load data
eset <- load.geo.soft("GDS507.soft", log.transform = TRUE)
geo <- extract.expression(eset)
# PCA visualization
pca.plot(geo$expression, geo$phenotype, color.by = "disease.state")
# Differential expression
de.results <- run.limma.de(geo)
# Volcano plot
volcano.plot(de.results, fc.threshold = 0.3)
# Select top genes
top.genes <- head(rownames(de.results[order(de.results$adj.P.Val), ]), 5)
probe.ids <- find.probe.by.gene(geo$gene, top.genes)
# Single gene analysis
expr <- get.gene.expression(geo$expression, probe.ids[1])
df <- build.analysis.df(expr, geo$phenotype, geo$gene)
gene.analysis.plot(df)
# LASSO model
phenotype.binary <- ifelse(geo$phenotype$disease.state == "disease", 1, 0)
lasso.fit <- fit.lasso(geo$expression, phenotype.binary)
```
---
## Interpretation
BioUtils integrates multiple layers of analysis:
* **PCA** reveals global structure in the data
* **Differential expression (limma)** identifies significant genes
* **Effect size & CI** quantify biological impact
* **LASSO** selects predictive biomarkers
* **GSEA** links results to biological pathways
---
## Documentation
Full documentation is available at https://spencertreadway.github.io/BioUtils/ or via:
```r
help(package = "BioUtils")
```
---
## Notes
* Probe-to-gene mapping depends on GEO platform annotations
* Fold-change thresholds are user-defined and dataset-dependent
* Statistical significance does not always imply biological relevance
---
## License
MIT
---
## Author
Spencer Treadway
Versions across snapshots
| Version | Repository | File | Size |
|---|---|---|---|
0.1.3 |
rolling linux/jammy R-4.5 | BioUtils_0.1.3.tar.gz |
2.3 MiB |
0.1.3 |
rolling linux/noble R-4.5 | BioUtils_0.1.3.tar.gz |
2.3 MiB |
0.1.3 |
rolling source/ R- | BioUtils_0.1.3.tar.gz |
2.3 MiB |
0.1.3 |
latest linux/jammy R-4.5 | BioUtils_0.1.3.tar.gz |
2.3 MiB |
0.1.3 |
latest linux/noble R-4.5 | BioUtils_0.1.3.tar.gz |
2.3 MiB |
0.1.3 |
latest source/ R- | BioUtils_0.1.3.tar.gz |
2.3 MiB |
0.1.3 |
2026-04-23 source/ R- | BioUtils_0.1.3.tar.gz |
0 B |