blocking
Various Blocking Methods for Entity Resolution
The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) doi:10.1145/3377455, Steorts et al. (2014) doi:10.1007/978-3-319-11257-2_20, Dasylva and Goussanou (2021) https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002, Dasylva and Goussanou (2022) doi:10.1007/s42081-022-00153-3.
- GitHub
- https://ncn-foreigners.ue.poznan.pl/blocking/
- File a bug report
- blocking results
- blocking.pdf
- Version1.0.1
- R versionR (≥ 4.1.0)
- LicenseGPL-3
- Needs compilation?No
- Last release06/18/2025
Documentation
Team
Maciej Beręsewicz
MaintainerShow author detailsAdam Struzik
Show author detailsRolesAuthor, ctr
Insights
Last 30 days
The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.
Last 365 days
The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.
Data provided by CRAN
Binaries
Dependencies
- Imports10 packages
- Suggests4 packages