CRAN/E | piecemaker

piecemaker

Tools for Preparing Text for Tokenizers

Installation

About

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

github.com/macmillancontentscience/piecemaker
macmillancontentscience.github.io/piecemaker/
Bug report File report

Key Metrics

Version 1.0.2
R ≥ 2.10
Published 2023-06-02 339 days ago
Needs compilation? no
License Apache License (≥ 2)
CRAN checks piecemaker results

Downloads

Yesterday 8 0%
Last 7 days 74 -24%
Last 30 days 269 -5%
Last 90 days 826 -19%
Last 365 days 3.438 +11%

Maintainer

Maintainer

Jon Harmon

jonthegeek@gmail.com

Authors

Jon Harmon

aut / cre

Jonathan Bratt

aut

Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning

cph

Material

README
NEWS
Reference manual
Package source

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-develnot available

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

piecemaker archive

Depends

R ≥ 2.10

Imports

cli
glue
rlang ≥ 0.4.2
stringi
stringr

Suggests

covr
testthat ≥ 3.0.0

Reverse Imports

morphemepiece
wordpiece