CRAN/E | morphemepiece

morphemepiece

Morpheme Tokenization

Installation

About

Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.

github.com/macmillancontentscience/morphemepiece
Bug report File report

Key Metrics

Version 1.2.3
Published 2022-04-16 741 days ago
Needs compilation? no
License Apache License (≥ 2)
CRAN checks morphemepiece results

Downloads

Yesterday 6 0%
Last 7 days 58 +7%
Last 30 days 251 -11%
Last 90 days 778 -13%
Last 365 days 3.074 -4%

Maintainer

Maintainer

Jonathan Bratt

jonathan.bratt@macmillan.com

Authors

Jonathan Bratt

aut / cre

Jon Harmon

aut

Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning

cph

Material

README
NEWS
Reference manual
Package source

Vignettes

Testing the fall-through algorithm
Generating a Vocabulary and Lookup

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

morphemepiece archive

Imports

dlr ≥ 1.0.0
fastmatch
magrittr
memoise ≥ 2.0.0
morphemepiece.data
piecemaker ≥ 1.0.0
purrr ≥ 0.3.4
readr
rlang
stringr ≥ 1.4.0

Suggests

dplyr
fs
ggplot2
here
knitr
remotes
rmarkdown
testthat ≥ 3.0.0
utils