Several R packages to handle missing values in clustering, multilevel data analysis and high-dimensional data analysis.

clusterMI

clusterMI is a R package to perform clustering with missing values. For achieving this goal, multiple imputation is used. The package offers various multiple imputation methods dedicated to clustered individuals (Audigier et al. (2021)). In addition, it allows pooling results both in terms of partition and instability (Audigier and Niang (2023)). Among applications, such functionalities can be used to choose a number of clusters with missing values.

More details are available in the associated vignette

micemd

micemd is a R package dedicated to multiple imputation with two-level data.

Why using micemd?

Statistical analysis often requires allowance for a multilevel structure. For example, a two-level structure occurs when individual data from several studies are aggregated, as in individual participant data (IPD) meta-analysis: individuals are at the lowest level, and the studies at the higher level. However, variables of each study are often incomplete (sporadically missing) and often differ between studies (leading to systematically missing variables), making challenging to analyse such data. micemd offers several solutions to overcome such issues.

What are its functionalities?

micemd is an ad-don for the mice R package which performs multiple imputation using chained equations. Its additional functionalities consist of:

imputation methods dedicated to
- sporadically and systematically missing values
- continuous, binary or count variables
tools for multiple imputation with mice:
- parallel calculation
- choice of the number of imputed tables
- overimputation for model checking

missMDA

missMDA is a R package that allows you to:

handle missing values in exploratory multivariate analysis such as principal component analysis (PCA), multiple correspondence analysis (MCA), factor analysis for mixed data (FAMD) and multiple factor analysis (MFA)
impute missing values for:
- continuous variables using the PCA model
- categorical variables using MCA
- mixed data using FAMD
generate multiple imputed data sets:
- for continuous data using the PCA model
- for categorical data using MCA
visualize multiple imputation in PCA and MCA

To apply a statistical method on an incomplete data set using missMDA, look at this vignette

R packages dealing with missing data can be found at the CRAN Task View on missing data (Josse et al. (2025))

References

Audigier, Vincent, and Ndèye Niang. 2023. “Clustering with Missing Data: Which Equivalent for Rubin’s Rules?” Advances in Data Analysis and Classification 17 (3): 623–57. https://arxiv.org/pdf/2011.13694.

Audigier, Vincent, Ndèye Niang, and Matthieu Resche-Rigon. 2021. Clustering with Missing Data: Which Imputation Model for Which Cluster Analysis Method? https://arxiv.org/abs/2106.04424.

Josse, Julie, Imke Mayer, Nicholas Tierney, and Nathalie Vialaneix. 2025. CRAN Task View: Missing Data. https://cran.r-project.org/web/views/MissingData.html.

Software

clusterMI

micemd

missMDA

References