Leslie Huang, Patrick O. Perry, Finn Årup Nielsen, Martin Porter, Richard Boulton, The Regents of the University of California, Carlo Strapparava, Alessandro Valitutti, Unicode, Inc.

CRAN/E | corpus

Installation

About

Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited 'JSON' files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.

leslie-huang.github.io/r-corpus/
github.com/leslie-huang/r-corpus
Bug report	File report

Key Metrics

Version	0.10.2
R	≥ 3.3
Published	2021-05-02 1089 days ago
Needs compilation?	yes
License	Apache License (== 2.0)
License	File
CRAN checks	corpus results

Downloads

Yesterday	13 0%
Last 7 days	38 -55%
Last 30 days	288 -9%
Last 90 days	957 -7%
Last 365 days	23.755 -96%

Maintainer

Leslie Huang

lesliehuang@nyu.edu

Authors

Leslie Huang	cre / ctb
Patrick O. Perry	aut / cph
Finn Årup Nielsen	cph / dtc (AFINN Sentiment Lexicon)
Martin Porter
Richard Boulton	ctb / cph / dtc (Snowball Stemmer and Stopword Lists)
The Regents of the University of California	ctb / cph (Strtod Library Procedure)
Carlo Strapparava
Alessandro Valitutti	cph / dtc (WordNet-Affect Lexicon)
Unicode
Inc.	cph / dtc (Unicode Character Database)

Material

Reference manual
Package source

Vignettes

Chinese text handling
Introduction to corpus
Stemming Words
Text data in Corpus and other packages

macOS

r-release	arm64
r-oldrel	arm64
r-release	x86_64
r-oldrel	x86_64

Windows

r-devel	x86_64
r-release	x86_64
r-oldrel	x86_64

Old Sources

corpus archive

Depends

R

≥ 3.3

Imports

stats
utf8	≥ 1.1.0

Suggests

knitr
rmarkdown
Matrix
testthat

Enhances

quanteda
tm

Reverse Imports

GenEst
stylest

corpus

Text Corpus Analysis