Home / p/wordpiece.data

wordpiece.data

Data for Wordpiece-Style Tokenization

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.

Versions across snapshots

Version Repository File Size

2.0.0 rolling linux/jammy R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB

2.0.0 rolling linux/noble R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB

2.0.0 rolling source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB

2.0.0 latest linux/jammy R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB

2.0.0 latest linux/noble R-4.5 wordpiece.data_2.0.0.tar.gz 288.3 KiB

2.0.0 latest source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB

2.0.0 2026-04-26 source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB

2.0.0 2026-04-23 source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB

2.0.0 2026-04-09 windows/windows R-4.5 wordpiece.data_2.0.0.zip 291.8 KiB

2.0.0 2025-04-20 source/ R- wordpiece.data_2.0.0.tar.gz 281.5 KiB

Version	Repository	File	Size
`2.0.0`	rolling linux/jammy R-4.5	`wordpiece.data_2.0.0.tar.gz`	288.3 KiB
`2.0.0`	rolling linux/noble R-4.5	`wordpiece.data_2.0.0.tar.gz`	288.3 KiB
`2.0.0`	rolling source/ R-	`wordpiece.data_2.0.0.tar.gz`	281.5 KiB
`2.0.0`	latest linux/jammy R-4.5	`wordpiece.data_2.0.0.tar.gz`	288.3 KiB
`2.0.0`	latest linux/noble R-4.5	`wordpiece.data_2.0.0.tar.gz`	288.3 KiB
`2.0.0`	latest source/ R-	`wordpiece.data_2.0.0.tar.gz`	281.5 KiB
`2.0.0`	2026-04-26 source/ R-	`wordpiece.data_2.0.0.tar.gz`	281.5 KiB
`2.0.0`	2026-04-23 source/ R-	`wordpiece.data_2.0.0.tar.gz`	281.5 KiB
`2.0.0`	2026-04-09 windows/windows R-4.5	`wordpiece.data_2.0.0.zip`	291.8 KiB
`2.0.0`	2025-04-20 source/ R-	`wordpiece.data_2.0.0.tar.gz`	281.5 KiB

Dependencies (latest)

Suggests

testthat (>= 3.0.0)