CRAN/E | SimCorrMix

SimCorrMix

Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

Installation

About

Generate continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count (regular or zero-inflated, Poisson or Negative Binomial) variables with a specified correlation matrix, or one continuous variable with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets (i.e., plasmodes, as in Vaughan et al., 2009 doi:10.1016/j.csda.2008.02.032). The methods extend those found in the 'SimMultiCorrData' R package. Standard normal variables with an imposed intermediate correlation matrix are transformed to generate the desired distributions. Continuous variables are simulated using either Fleishman (1978)'s third order doi:10.1007/BF02293811 or Headrick (2002)'s fifth order doi:10.1016/S0167-9473(02)00072-5 polynomial transformation method (the power method transformation, PMT). Non-mixture distributions require the user to specify mean, variance, skewness, standardized kurtosis, and standardized fifth and sixth cumulants. Mixture distributions require these inputs for the component distributions plus the mixing probabilities. Simulation occurs at the component level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. However, the package provides functions to approximate expected correlations with continuous mixture variables given target correlations with the components. Binary and ordinal variables are simulated using a modification of ordsample() in package 'GenOrd'. Count variables are simulated using the inverse CDF method. There are two simulation pathways which calculate intermediate correlations involving count variables differently. Correlation Method 1 adapts Yahav and Shmueli's 2012 method doi:10.1002/asmb.901 and performs best with large count variable means and positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and Ferrari's 2015 modification of the 'GenOrd' package doi:10.1002/asmb.2072 and performs best under the opposite scenarios. The optional error loop may be used to improve the accuracy of the final correlation matrix. The package also contains functions to calculate the standardized cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables.

github.com/AFialkowski/SimCorrMix

Key Metrics

Version 0.1.1
R ≥ 3.4.0
Published 2018-07-01 2119 days ago
Needs compilation? no
License GPL-2
CRAN checks SimCorrMix results

Downloads

Yesterday 10 0%
Last 7 days 27 +42%
Last 30 days 132 -7%
Last 90 days 627 +46%
Last 365 days 1.936 -19%

Maintainer

Maintainer

Allison Cynthia Fialkowski

allijazz@uab.edu

Authors

Allison Cynthia Fialkowski

Material

README
NEWS
Reference manual
Package source

Vignettes

Continuous Mixture Distributions
Expected Cumulants and Correlations for Continuous Mixture Variables
Comparison of Correlation Methods 1 and 2
Variable Types
Overall Workflow for Generation of Correlated Data

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

SimCorrMix archive

Depends

R ≥ 3.4.0
SimMultiCorrData ≥ 0.2.1

Imports

BB
nleqslv
MASS
mvtnorm
Matrix
VGAM
triangle
ggplot2
grid
stats
utils

Suggests

knitr
rmarkdown
printr
bookdown
testthat