Crandore Hub

NUSS

Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Versions across snapshots

VersionRepositoryFileSize
0.1.0 rolling linux/jammy R-4.5 NUSS_0.1.0.tar.gz 855.7 KiB
0.1.0 rolling linux/noble R-4.5 NUSS_0.1.0.tar.gz 858.1 KiB
0.1.0 rolling source/ R- NUSS_0.1.0.tar.gz 292.7 KiB
0.1.0 latest linux/jammy R-4.5 NUSS_0.1.0.tar.gz 855.7 KiB
0.1.0 latest linux/noble R-4.5 NUSS_0.1.0.tar.gz 858.1 KiB
0.1.0 latest source/ R- NUSS_0.1.0.tar.gz 292.7 KiB
0.1.0 2026-04-26 source/ R- NUSS_0.1.0.tar.gz 292.7 KiB
0.1.0 2026-04-23 source/ R- NUSS_0.1.0.tar.gz 292.7 KiB
0.1.0 2026-04-09 windows/windows R-4.5 NUSS_0.1.0.zip 1.1 MiB
0.1.0 2025-04-20 source/ R- NUSS_0.1.0.tar.gz 292.7 KiB

Dependencies (latest)

Imports

LinkingTo

Suggests