blocking

Various Blocking Methods for Entity Resolution

CRAN Package

The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) doi:10.1145/3377455, Steorts et al. (2014) doi:10.1007/978-3-319-11257-2_20, Dasylva and Goussanou (2021) https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002, Dasylva and Goussanou (2022) doi:10.1007/s42081-022-00153-3.

  • Version1.0.1
  • R versionR (≥ 4.1.0)
  • LicenseGPL-3
  • Needs compilation?No
  • Last release06/18/2025

Documentation


Team


Insights

Last 30 days

The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.

Last 365 days

The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.

Data provided by CRAN


Binaries


Dependencies

  • Imports10 packages
  • Suggests4 packages