Skip to contents

This function visualizes the proportion of missing data or reporting rate for specified variables in a dataset. It creates a tile plot using ggplot2; where the x-axis can represent any categorical time such as time (e.g., year, month), and the y-axis can represents either variables or groupings (e.g., state). The output can further be manipulated to one's needs.


missing_plot(data, x_var, y_var = NULL, miss_vars = NULL, use_rep_rate = FALSE)



A data frame containing the data to be visualized. Must include columns specified in 'x_var', 'y_var', and 'vars'.


A character string specifying the time variable in 'data' (e.g., "year", "month"). Must be provided.


An optional character string specifying the grouping variable in 'data' (e.g., "state"). If provided, only one variable can be specified in 'vars'.


An optional character vector specifying the variables to be visualized in 'data'. If NULL, all variables except 'x_var' and 'y_var' will be used.


A logical value. If TRUE, the reporting rate is visualized; otherwise, the proportion of missing data is visualized. Defaults to FALSE


A ggplot2 object representing the tile plot.


# get path
path <- system.file(
         package = "epiCleanr")

fake_epi_df_togo <- import(path)

# Check misisng data by year
result <- missing_plot(fake_epi_df_togo,
             x_var = "year", use_rep_rate = FALSE)