CRAN/E | corpus

corpus

Text Corpus Analysis

Installation

About

Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited 'JSON' files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.

leslie-huang.github.io/r-corpus/
github.com/leslie-huang/r-corpus
Bug report File report

Key Metrics

Version 0.10.2
R ≥ 3.3
Published 2021-05-02 1089 days ago
Needs compilation? yes
License Apache License (== 2.0)
License File
CRAN checks corpus results

Downloads

Yesterday 13 0%
Last 7 days 38 -55%
Last 30 days 288 -9%
Last 90 days 957 -7%
Last 365 days 23.755 -96%

Maintainer

Maintainer

Leslie Huang

lesliehuang@nyu.edu

Authors

Leslie Huang

cre / ctb

Patrick O. Perry

aut / cph

Finn Årup Nielsen

cph / dtc

(AFINN Sentiment Lexicon)

Martin Porter
Richard Boulton

ctb / cph / dtc

(Snowball Stemmer and Stopword Lists)

The Regents of the University of California

ctb / cph

(Strtod Library Procedure)

Carlo Strapparava
Alessandro Valitutti

cph / dtc

(WordNet-Affect Lexicon)

Unicode
Inc.

cph / dtc

(Unicode Character Database)

Material

Reference manual
Package source

Vignettes

Chinese text handling
Introduction to corpus
Stemming Words
Text data in Corpus and other packages

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

corpus archive

Depends

R ≥ 3.3

Imports

stats
utf8 ≥ 1.1.0

Suggests

knitr
rmarkdown
Matrix
testthat

Enhances

quanteda
tm

Reverse Imports

GenEst
stylest