Skip to contents

This function visualizes the proportion of missing data or reporting rate for specified variables in a dataset. It creates a tile plot using ggplot2; where the x-axis can represent any categorical time such as time (e.g., year, month), and the y-axis can represents either variables or groupings (e.g., state). The output can further be manipulated to one's needs.

Usage

missing_plot(data, x_var, y_var = NULL, miss_vars = NULL, use_rep_rate = FALSE)

Arguments

data

A data frame containing the data to be visualized. Must include columns specified in 'x_var', 'y_var', and 'vars'.

x_var

A character string specifying the time variable in 'data' (e.g., "year", "month"). Must be provided.

y_var

An optional character string specifying the grouping variable in 'data' (e.g., "state"). If provided, only one variable can be specified in 'vars'.

miss_vars

An optional character vector specifying the variables to be visualized in 'data'. If NULL, all variables except 'x_var' and 'y_var' will be used.

use_rep_rate

A logical value. If TRUE, the reporting rate is visualized; otherwise, the proportion of missing data is visualized. Defaults to FALSE

Value

A ggplot2 object representing the tile plot.

Examples



# get path
path <- system.file(
        "extdata",
        "fake_epi_df_togo.rds",
         package = "epiCleanr")

fake_epi_df_togo <- import(path)

# Check misisng data by year
result <- missing_plot(fake_epi_df_togo,
             x_var = "year", use_rep_rate = FALSE)