Author: Daniela Petruzalek
Email: [email protected]
License: AGPLv3
read.dbc is an R package designed to handle the DBC file format, a proprietary compressed database format used by the Brazilian Ministry of Health (DATASUS) for public healthcare datasets.
It provides functionality to:
- Read
.dbcfiles directly into R data frames (read.dbc). - Decompress
.dbcfiles into standard.dbffiles (dbc2dbf). - Compress standard
.dbffiles into.dbcformat (dbf2dbc) (New in v1.2.0!).
This project is based on the work of Mark Adler (blast) and Pablo Fonseca (blast-dbf).
Note: This project is not affiliated with the Brazilian government.
The package has undergone a major overhaul to ensure stability and performance:
- Thread Safety: Complete refactoring of the C codebase to remove global state, making it safe for parallel execution (e.g.,
mclapply). - Compression Support: Added experimental support for creating
.dbcfiles from.dbffiles. - Robustness: Improved error handling, buffer management (fixing stack overflows with large files), and memory safety.
For those interested in the technical details of the proprietary DBC format or the compression algorithms used:
- DBC File Format Specification: A high-level overview of the file structure and compression logic.
- Internals & Algorithms: A deep dive into the "Implode" algorithm, bit stream encoding, and implementation details.
- Changelog: Detailed history of changes.
The stable version is available on CRAN:
install.packages("read.dbc")To install the latest development version from GitHub:
# install.packages("devtools")
devtools::install_github("danicat/read.dbc")library(read.dbc)
# Read a sample DBC file included in the package
sids <- read.dbc(system.file("files/sids.dbc", package="read.dbc"))
print(str(sids))
print(summary(sids))# Example: Downloading "Declarations of Death" for Parana state, 2013
url <- "ftp://ftp.datasus.gov.br/dissemin/publicos/SIM/CID10/DORES/DOPR2013.dbc"
tryCatch({
download.file(url, destfile = "DOPR2013.dbc", mode = "wb")
dopr <- read.dbc("DOPR2013.dbc")
head(dopr)
}, error = function(e) {
message("Could not download or read the file: ", e$message)
})in.f <- system.file("files/sids.dbc", package = "read.dbc")
out.f <- tempfile(fileext = ".dbf")
if( dbc2dbf(input.file = in.f, output.file = out.f) ) {
message("File decompressed to: ", out.f)
}New in v1.2.0
# Using the DBF created in the previous step
dbc.f <- tempfile(fileext = ".dbc")
if( dbf2dbc(input.file = out.f, output.file = dbc.f) ) {
message("File compressed to: ", dbc.f)
}- Requirements: R, RStudio (optional), C compiler (gcc/clang).
- Commands:
make setup: Install dependencies.make check: Run R CMD check.make test: Run unit tests.make clean: Clean build artifacts.make help: List all available commands.
If you have questions or issues, please open an issue on GitHub or contact the author at [email protected].