Crandore Hub

boilerpipeR

Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Versions across snapshots

VersionRepositoryFileSize
1.3.2 2026-04-09 windows/windows R-4.5 boilerpipeR_1.3.2.zip 1.5 MiB

Dependencies (latest)

Imports

Suggests