Crandore Hub

wordpiece.data

Data for Wordpiece-Style Tokenization

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.

Versions across snapshots

VersionRepositoryFileSize
2.0.0 rolling linux/jammy R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB
2.0.0 rolling linux/noble R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB
2.0.0 rolling source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB
2.0.0 latest linux/jammy R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB
2.0.0 latest linux/noble R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB
2.0.0 latest source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB
2.0.0 2026-04-26 source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB
2.0.0 2026-04-23 source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB
2.0.0 2026-04-09 windows/windows R-4.5 wordpiece.data_2.0.0.zip 291.8 KiB
2.0.0 2025-04-20 source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB

Dependencies (latest)

Suggests