Crandore Hub

polyRAD

Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Read depth data from genotyping-by-sequencing (GBS) or restriction site-associated DNA sequencing (RAD-seq) are imported and used to make Bayesian probability estimates of genotypes in polyploids or diploids. The genotype probabilities, posterior mean genotypes, or most probable genotypes can then be exported for downstream analysis. 'polyRAD' is described by Clark et al. (2019) <doi:10.1534/g3.118.200913>, and the Hind/He statistic for marker filtering is described by Clark et al. (2022) <doi:10.1186/s12859-022-04635-9>. A variant calling pipeline for highly duplicated genomes is also included and is described by Clark et al. (2020, Version 1) <doi:10.1101/2020.01.11.902890>.

README

# Python helper scripts for polyRAD

## Using Python

If you are new to running Python from the command line, and are using RStudio,
the simplest thing to do is click on the "Terminal" tab to get to your
operating system's terminal/shell/command prompt.  Python scripts
should be run from there.  (Not from the R Console.)

[Python 3](https://www.python.org/) is required for the scripts in this
directory to function.  However, you may have Python 2 as the default Python on
your computer, since many computer programs and even some operating systems
depend on it.  To check, run

``` bash
python --version
```

You should see something like `Python 3.x.x`, where there are numbers in
place of `x`.  If you see something else, you may need to install Python 3
and/or make sure the path to Python 3 is in your system's `PATH` variable.
If you need help with that, your department's IT person can probably get it
done in five or ten minutes (ok, on CentOS it was more of a challenge.  But
on Windows it should be quick).  In some operating systems, instead of typing
`python` you can type `python3` or `python36` to specify the version of
Python to use, if multiple versions are installed.

## Find GBS/RAD tags associated with alleles from TASSEL

If you used `VCF2RADdata`, with `phaseSNPs = TRUE` and a non-null `refgenome`
argument, to import a VCF that was generated by the TASSEL-GBSv2 pipeline,
the script `tassel_vcf_tags.py` can help you to find the full tag sequence(s)
associated with each allele.

If `obj` is the name of a `RADdata` object in your R environment, from R
run

``` R
cat(GetAlleneNames(obj), sep = "\n", file = "myalleles.txt")
```

Then in the Terminal, run

``` bash
python tassel_vcf_tags.py -a myalleles.txt -s alignment_from_tassel.sam -o mytags.txt
```

where `alignment_from_tassel.sam` was the SAM file created by Bowtie2 or BWA
as part of the TASSEL-GBSv2 pipeline.

The file `mytags.txt` is tab-delimited.  The first column contains the allele
names from polyRAD.  The second column contains the tag sequences, starting at
the restriction cut site.  If multiple tag sequences matched the allele, they
will be separated by a semi-colon (`;`).  Note that if a tag aligned to the
bottom strand, the sequence seen in the allele name may be the reverse
complement of the sequence seen in the tag.

In my own data, this script has been successful in identifying tags for about
90% of alleles.  The rest can be attributed to quirks in how TASSEL determines
SNP locations, as well as errors in the phasing performed by `VCF2RADdata`.

## Adjust tag alignments in highly duplicated genomes.

The files `process_sam_multi.py`, `isoloci_fun.py`, and `process_isoloci.py` are
intended to assist with assigning tags to correct genomic locations in highly
duplicated reference genomes, such as those of recent or ancient allopolyploids.
See the vignette "Variant and Genotype Calling in Highly Duplicated Genomes"
(`isolocus_sorting.Rmd`) for more information.

Versions across snapshots

VersionRepositoryFileSize
2.0.1 rolling linux/jammy R-4.5 polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 rolling linux/noble R-4.5 polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 rolling source/ R- polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 latest linux/jammy R-4.5 polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 latest linux/noble R-4.5 polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 latest source/ R- polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 2026-04-26 source/ R- polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.1 2026-04-23 source/ R- polyRAD_2.0.1.tar.gz 2.3 MiB
2.0.0 2025-04-20 source/ R- polyRAD_2.0.0.tar.gz 2.3 MiB

Dependencies (latest)

Depends

Imports

LinkingTo

Suggests