i

R Programming Complete Tutorial

Missing data treatment

In this section, we are going to explain different alternatives for missing data treatment:

Listwise remove:

It is always a bad idea to remove Missing values. Suppose we have a dataset of 10 variables. If we delete one data point, it will also delete 9 more data points from other variables, which may contain useful information. This is the best avoidable method unless the data type is MCAR.It is only performed if there is a sufficient amount of data available after deleting those observations.

Example:

Imputation (Mean, Median, Random,…):

It is a better approach as we need not delete missing values. It introduces a wrong representation of the relationship of the variable with other variables in the dataset. 

  • Mean imputation decreases the variance of the imputed variables.

  • Mean imputation shrinks standard errors, which invalidates most hypothesis tests and the calculation of confidence interval.

  • Mean imputation does not preserve relationships between variables, such as correlations.

Example: