CRAN/E | tok

tok

Fast Text Tokenization

Installation

About

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm . It's extremely fast for both training new vocabularies and tokenizing texts.

github.com/mlverse/tok
System requirements Rust tool chain w/ cargo, libclang/llvm-config
Bug report File report

Key Metrics

Version 0.1.1
R ≥ 4.2.0
Published 2023-08-17 260 days ago
Needs compilation? yes
License MIT
License File
CRAN checks tok results

Downloads

Yesterday 14 0%
Last 7 days 60 -17%
Last 30 days 197 +40%
Last 90 days 426 -21%
Last 365 days 1.646

Maintainer

Maintainer

Daniel Falbel

daniel@posit.co

Authors

Daniel Falbel

aut / cre

Posit

cph

Material

README
NEWS
Reference manual
Package source

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

tok archive

Depends

R ≥ 4.2.0

Imports

R6
cli

Suggests

rmarkdown
testthat ≥ 3.0.0
hfhub
withr