CRAN/E | wordpiece.data

wordpiece.data

Data for Wordpiece-Style Tokenization

Installation

About

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from and and parsed into an R-friendly format.

github.com/macmillancontentscience/wordpiece.data
Bug report File report

Key Metrics

Version 2.0.0
R ≥ 3.5.0
Published 2022-03-03 792 days ago
Needs compilation? no
License Apache License (≥ 2)
CRAN checks wordpiece.data results

Downloads

Yesterday 18 0%
Last 7 days 67 -6%
Last 30 days 223 -7%
Last 90 days 632 -23%
Last 365 days 2.473 -14%

Maintainer

Maintainer

Jon Harmon

jonthegeek@gmail.com

Authors

Jonathan Bratt

aut

Jon Harmon

aut / cre

Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning

cph

Google
Inc

cph

(original BERT vocabularies)

Material

README
NEWS
Reference manual
Package source

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

wordpiece.data archive

Depends

R ≥ 3.5.0

Suggests

testthat ≥ 3.0.0

Reverse Imports

wordpiece