wordpiece.data
Data for Wordpiece-Style Tokenization
Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.
Versions across snapshots
| Version | Repository | File | Size |
|---|---|---|---|
2.0.0 |
rolling linux/jammy R-4.5 | wordpiece.data_2.0.0.tar.gz |
288.3 KiB |
2.0.0 |
rolling linux/noble R-4.5 | wordpiece.data_2.0.0.tar.gz |
288.3 KiB |
2.0.0 |
rolling source/ R- | wordpiece.data_2.0.0.tar.gz |
281.5 KiB |
2.0.0 |
latest linux/jammy R-4.5 | wordpiece.data_2.0.0.tar.gz |
288.3 KiB |
2.0.0 |
latest linux/noble R-4.5 | wordpiece.data_2.0.0.tar.gz |
288.3 KiB |
2.0.0 |
latest source/ R- | wordpiece.data_2.0.0.tar.gz |
281.5 KiB |
2.0.0 |
2026-04-26 source/ R- | wordpiece.data_2.0.0.tar.gz |
281.5 KiB |
2.0.0 |
2026-04-23 source/ R- | wordpiece.data_2.0.0.tar.gz |
281.5 KiB |
2.0.0 |
2026-04-09 windows/windows R-4.5 | wordpiece.data_2.0.0.zip |
291.8 KiB |
2.0.0 |
2025-04-20 source/ R- | wordpiece.data_2.0.0.tar.gz |
281.5 KiB |
Dependencies (latest)
Suggests
- testthat (>= 3.0.0)