Title: | Automatic Data Validation and Reporting |
---|---|
Description: | Validate dataset by columns and rows using convenient predicates inspired by 'assertr' package. Generate good looking HTML report or print console output to display in logs of your data processing pipeline. |
Authors: | Marcin Dubel [aut, cre], Paweł Przytuła [aut], Jakub Nowicki [aut], Krystian Igras [aut], Dominik Krzeminski [ctb], Servet Ahmet Çizmeli [ctb], Appsilon Sp. z o.o. [cph] |
Maintainer: | Marcin Dubel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.1 |
Built: | 2025-02-12 02:55:23 UTC |
Source: | https://github.com/appsilon/data.validator |
This function adds results to validator object with aggregating summary of success, error and warning checks. Moreover it parses assertr results attributes and stores them inside usable table.
add_results(data, report)
add_results(data, report)
data |
Data that was validated. |
report |
Report object to store validation results. |
The object returns R6 class environment responsible for storing validation results.
data_validation_report()
data_validation_report()
The response is a list containing information about successful, failed, warning assertions and the table stores important information about validation results. Those are:
table_name - name of validated table
assertion.id - id used for each assertion
description - assertion description
num.violations - number of violations (assertion and column specific)
call - assertion call
message - assertion result message for specific column
type - error, warning or success
error_df - nested table storing details about error or warning result (like violated indexes and values)
get_results(report, unnest = FALSE)
get_results(report, unnest = FALSE)
report |
Report object that stores validation results. See add_results. |
unnest |
If TRUE, error_df table is unnested. Results with remaining columns duplicated in table. |
Renders content of simple report version that prints validation_results
table.
render_raw_report_ui( validation_results, success = TRUE, warning = TRUE, error = TRUE )
render_raw_report_ui( validation_results, success = TRUE, warning = TRUE, error = TRUE )
validation_results |
Validation results table (see get_results). |
success |
Should success results be presented? |
warning |
Should warning results be presented? |
error |
Should error results be presented? |
Renders content of semantic report version.
render_semantic_report_ui( validation_results, success = TRUE, warning = TRUE, error = TRUE, df_error_head_n = 6L )
render_semantic_report_ui( validation_results, success = TRUE, warning = TRUE, error = TRUE, df_error_head_n = 6L )
validation_results |
Validation results table (see get_results). |
success |
Should success results be presented? |
warning |
Should warning results be presented? |
error |
Should error results be presented? |
df_error_head_n |
Number of rows to display in error table.
Works in the same way as |
Saving results as a HTML report
save_report( report, output_file = "validation_report.html", output_dir = getwd(), ui_constructor = render_semantic_report_ui, template = system.file("rmarkdown/templates/standard/skeleton/skeleton.Rmd", package = "data.validator"), ... )
save_report( report, output_file = "validation_report.html", output_dir = getwd(), ui_constructor = render_semantic_report_ui, template = system.file("rmarkdown/templates/standard/skeleton/skeleton.Rmd", package = "data.validator"), ... )
report |
Report object that stores validation results. |
output_file |
Html file name to write report to. |
output_dir |
Target report directory. |
ui_constructor |
Function of |
template |
Path to Rmd template in which ui_constructor is rendered. See
|
... |
Additional parameters passed to |
Saving results table to external file
save_results(report, file_name = "results.csv", method = utils::write.csv, ...)
save_results(report, file_name = "results.csv", method = utils::write.csv, ...)
report |
Report object that stores validation results. See get_results. |
file_name |
Name of the resulting file (including extension). |
method |
Function that should be used to save results table (write.csv default)
The function passed to |
... |
Remaining parameters passed to |
Saves print(validator)
output inside text file.
save_summary( report, file_name = "validation_log.txt", success = TRUE, warning = TRUE, error = TRUE )
save_summary( report, file_name = "validation_log.txt", success = TRUE, warning = TRUE, error = TRUE )
report |
Report object that stores validation results. |
file_name |
Name of the resulting file (including extension). |
success |
Should success results be presented? |
warning |
Should warning results be presented? |
error |
Should error results be presented? |
Prepare data for validation and generating report. The function prepares data for chain validation and ensures all the validation results are gathered correctly. The function also attaches additional information to the data (name and description) that is then displayed in validation report.
validate(data, name, description = NULL)
validate(data, name, description = NULL)
data |
data.frame or tibble to test |
name |
name of validation object (will be displayed in the report) |
description |
description of validation object (will be displayed in the report) |
Validation on columns
validate_cols( data, predicate, ..., obligatory = FALSE, description = NA, skip_chain_opts = FALSE, success_fun = assertr::success_append, error_fun = assertr::error_append, defect_fun = assertr::defect_append )
validate_cols( data, predicate, ..., obligatory = FALSE, description = NA, skip_chain_opts = FALSE, success_fun = assertr::success_append, error_fun = assertr::error_append, defect_fun = assertr::defect_append )
data |
A data.frame or tibble to test |
predicate |
Predicate function or predicate generator such as |
... |
Columns selection that |
obligatory |
If TRUE and assertion failed the data is marked as defective. For defective data, all the following rules are handled by defect_fun function |
description |
A character string with description of assertion. The description is then displayed in the validation report |
skip_chain_opts |
While wrapping data with validate function, |
success_fun |
Function that is called when the validation pass |
error_fun |
Function that is called when the validation fails |
defect_fun |
Function that is called when the data is marked as defective |
validate_if validate_rows
The function checks whether all the logical values returned by the expression are TRUE. The function is meant for handling all the cases that cannot be reached by using validate_cols and validate_rows functions.
validate_if( data, expr, description = NA, obligatory = FALSE, skip_chain_opts = FALSE, success_fun = assertr::success_append, error_fun = assertr::error_append, defect_fun = assertr::defect_append )
validate_if( data, expr, description = NA, obligatory = FALSE, skip_chain_opts = FALSE, success_fun = assertr::success_append, error_fun = assertr::error_append, defect_fun = assertr::defect_append )
data |
A data.frame or tibble to test |
expr |
A Logical expression to test for, e.g. |
description |
A character string with description of assertion. The description is then displayed in the validation report |
obligatory |
If TRUE and assertion failed the data is marked as defective. For defective data, all the following rules are handled by defect_fun function |
skip_chain_opts |
While wrapping data with validate function, |
success_fun |
Function that is called when the validation pass |
error_fun |
Function that is called when the validation fails |
defect_fun |
Function that is called when the data is marked as defective |
validate_cols validate_rows
Validation on rows
validate_rows( data, row_reduction_fn, predicate, ..., obligatory = FALSE, description = NA, skip_chain_opts = FALSE, success_fun = assertr::success_append, error_fun = assertr::error_append, defect_fun = assertr::defect_append )
validate_rows( data, row_reduction_fn, predicate, ..., obligatory = FALSE, description = NA, skip_chain_opts = FALSE, success_fun = assertr::success_append, error_fun = assertr::error_append, defect_fun = assertr::defect_append )
data |
A data.frame or tibble to test |
row_reduction_fn |
Function that should reduce rows into a single column that is passed to
validation e.g. |
predicate |
Predicate function or predicate generator such as |
... |
Columns selection that |
obligatory |
If TRUE and assertion failed the data is marked as defective. For defective data, all the following rules are handled by defect_fun function |
description |
A character string with description of assertion. The description is then displayed in the validation report |
skip_chain_opts |
While wrapping data with validate function, |
success_fun |
Function that is called when the validation pass |
error_fun |
Function that is called when the validation fails |
defect_fun |
Function that is called when the data is marked as defective |
validate_cols validate_if