CRAN/E | boilerpipeR

boilerpipeR

Interface to the Boilerpipe Java Library

Installation

About

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

github.com/mannau/boilerpipeR
Bug report File report

Key Metrics

Version 1.3.2
Published 2021-05-19 1073 days ago
Needs compilation? no
License Apache License (== 2.0)
CRAN checks boilerpipeR results

Downloads

Yesterday 8 0%
Last 7 days 57 -5%
Last 30 days 304 +0%
Last 90 days 845 -23%
Last 365 days 3.686 -30%

Maintainer

Maintainer

Mario Annau

mario.annau@gmail.com

Authors

See AUTHORS file.

Material

NEWS
Reference manual
Package source

In Views

NaturalLanguageProcessing
WebTechnologies

Vignettes

Introduction to the tm.plugin.webmining Package

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

boilerpipeR archive

Imports

rJava

Suggests

RCurl