CRAN/E | tokenizers.bpe

tokenizers.bpe

Byte Pair Encoding Text Tokenization

Installation

About

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library which is an implementation of fast Byte Pair Encoding (BPE) .

github.com/bnosac/tokenizers.bpe

Key Metrics

Version 0.1.3
R ≥ 2.10
Published 2023-09-15 236 days ago
Needs compilation? yes
License MPL-2.0
CRAN checks tokenizers.bpe results

Downloads

Yesterday 13 0%
Last 7 days 83 -34%
Last 30 days 313 +12%
Last 90 days 874 -27%
Last 365 days 3.880 +3%

Maintainer

Maintainer

Jan Wijffels

jwijffels@bnosac.be

Authors

Jan Wijffels

aut / cre / cph

(R wrapper)

BNOSAC

cph

(R wrapper)

VK.com

cph

Gregory Popovitch

ctb / cph

(Files

at src/parallel_hashmap

(Apache License, Version 2.0)

The Abseil Authors

ctb / cph

(Files

at src/parallel_hashmap

(Apache License, Version 2.0)

Ivan Belonogov

ctb / cph

(Files at src/youtokentome (MIT License))

Material

README
NEWS
Reference manual
Package source

In Views

NaturalLanguageProcessing

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

tokenizers.bpe archive

Depends

R ≥ 2.10

Imports

Rcpp ≥ 0.11.5

LinkingTo

Rcpp

Reverse Suggests

doc2vec
sentencepiece
textrecipes