CRAN/E | corpustools

corpustools

Managing, Querying and Analyzing Tokenized Text

Installation

About

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

github.com/kasperwelbers/corpustools

Key Metrics

Version 0.5.1
R ≥ 3.5.0
Published 2023-05-08 352 days ago
Needs compilation? yes
License GPL-3
CRAN checks corpustools results

Downloads

Yesterday 103 +368%
Last 7 days 264 +14%
Last 30 days 847 +17%
Last 90 days 2.367 -43%
Last 365 days 12.083 -7%

Maintainer

Maintainer

Kasper Welbers

kasperwelbers@gmail.com

Authors

Kasper Welbers
Wouter van Atteveldt

Material

README
NEWS
Reference manual
Package source

Vignettes

Introduction to corpustools

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

corpustools archive

Depends

R ≥ 3.5.0

Imports

methods
wordcloud ≥ 2.5
stringi
Rcpp ≥ 0.12.12
R6
udpipe ≥ 0.8.3
digest
data.table ≥ 1.10.4
quanteda ≥1.5.1
igraph
tokenbrowser ≥ 0.1.5
RNewsflow ≥ 1.2.1
Matrix ≥ 1.2
parallel
pbapply ≥ 1.4
rsyntax ≥ 0.1.1

Suggests

testthat
tm ≥ 0.6
topicmodels
knitr
rmarkdown

LinkingTo

Rcpp
RcppProgress

Reverse Imports

text2sdg

Reverse Suggests

LexisNexisTools