Question 1

What does the R-package 'tokenizers' do?

Accepted Answer

Fast, Consistent Tokenization of Natural Language Text. Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words.  The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for  fast yet correct tokenization in 'UTF-8'.

Question 2

Who maintains tokenizers?

Accepted Answer

Lincoln Mullen

Question 3

Who authored tokenizers?

Accepted Answer

Os Keyes, Dmitriy Selivanov, Kenneth Benoit, Jeffrey Arnold

Question 4

What is the current version of tokenizers?

Accepted Answer

The current version of the R-package '0.3.0' is 0.3.0

Question 5

When was the last release of tokenizers?

Accepted Answer

The last release of the R-package '0.3.0' was 12/22/2022

Question 6

Where can I search for the R-package 'tokenizers'?

Accepted Answer

You can search for the R-package 'tokenizers' on CRAN/E at https://cran-e.com

tokenizers

Documentation

Team

Lincoln Mullen

Os Keyes

Dmitriy Selivanov

Kenneth Benoit

Jeffrey Arnold

Insights

Last 30 days

Last 365 days

Binaries

Dependencies