--- title: "Using a LKJ prior in Stan" date: "2016-03-07" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message=FALSE) ``` There are two ways to use a LKJ prior distribution for a correlation matrix in STAN. The first one assigns the distribution on the correlation matrix, whereas the second one assigns the distribution on the lower Cholesky factor of the correlation matrix. I am going to show an example for a trivariate normal sample with a fixed mean: $$ y_i \sim_{\text{iid}} {\cal N}\left( \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}, \Sigma\right). $$ Recall the relation between the covariance matrix and the correlation matrix: $$ \begin{pmatrix} \sigma_{1}^2 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma_2^2 & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma_3^2 \end{pmatrix} = \begin{pmatrix} \sigma_{1}^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \end{pmatrix} \Omega \begin{pmatrix} \sigma_{1}^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \end{pmatrix} $$ This operation is performed in Stan by the function `quad_form_diag`. I do not assume the reader familiar with Stan or the `rstan` package, so I will comment each step. First I load the `rstan` package with the usual options: ```{r} library(rstan) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) ``` I will run Stan on these simulated data: ```{r} set.seed(666) Omega <- rbind( c(1, 0.3, 0.2), c(0.3, 1, 0.1), c(0.2, 0.1, 1) ) sigma <- c(1, 2, 3) Sigma <- diag(sigma) %*% Omega %*% diag(sigma) N <- 100 y <- mvtnorm::rmvnorm(N, c(0,0,0), Sigma) ``` ## Prior on the correlation matrix Below is the Stan code for the Bayesian model assigning a LKJ prior on the correlation matrix $\Omega$. I use the LKJ distribution with shape parameter $1$, which is the uniform distribution on the space of correlation matrices. ```{r} stancode1 <- 'data { int N; // number of observations int J; // dimension of observations vector[J] y[N]; // observations vector[J] Zero; // a vector of Zeros (fixed means of observations) } parameters { corr_matrix[J] Omega; vector[J] sigma; } transformed parameters { cov_matrix[J] Sigma; Sigma <- quad_form_diag(Omega, sigma); } model { y ~ multi_normal(Zero,Sigma); // sampling distribution of the observations sigma ~ cauchy(0, 5); // prior on the standard deviations Omega ~ lkj_corr(1); // LKJ prior on the correlation matrix }' ``` The `stan_model` function of the `rstan` package runs the Stan compilation of the model. ```{r compilation1, eval=TRUE, cache=TRUE} stanmodel1 <- stan_model(model_code = stancode1, model_name="stanmodel1") ``` Note that this function only takes the Stan code as input. Once the model is compiled, the `stanmodel` object can be used to run the model on different datasets with the `sampling` function. The data must be passed to the `sampling` function into a list: ```{r} standata <- list(J = ncol(y), N=N, y = y, Zero=rep(0, ncol(y))) ``` The algorithms used by Stan to generate the posterior distributions require initial values of the parameters. One can let the `sampling` function generate random initial values, or pass them in its `init` argument. I prefer to give my initial values. More precisely, one must pass to the `init` argument a function that returns the initial values in a list (see `?sampling`). Here is the way I use to generate initial values. Firstly, I write a function that returns some estimates of the parameters, taking the observations as input and allowing to randomly perturb these observations: ```{r} estimates <- function(y, perturb=FALSE){ if(perturb) y <- y + rnorm(length(y), 0, 1) sigma <- sqrt(diag(var(y))) Omega <- cor(y) return(list(sigma=sigma, Omega=Omega)) } ``` I run Stan with several chains, for instance four chains. The function passed to the `init` argument of the `sampling` function takes an optional argument `chain_id`. For the first chain, I use the estimates calculated from the original data as initial values, and for the other chains I use the estimates calculated from the disturbed original data: ```{r} inits <- function(chain_id){ values <- estimates(standata$y, perturb = chain_id > 1) return(values) } ``` Now we are ready to run Stan: ```{r sampling1} samples1 <- sampling(stanmodel1, data = standata, init=inits, iter = 10000, warmup = 1000, chains = 4) ``` Some numerical problems occur but they are benign. It is not abnormal to get some messages such as ``` validate transformed params: Sigma is not positive definite validate transformed params: Sigma is not symmetric. Sigma[1,2] = -nan, but Sigma[2,1] = -nan Exception thrown at line 23: lkj_corr_log: y is not positive definite validate transformed params: Sigma is not symmetric. Sigma[1,2] = inf, but Sigma[2,1] = inf ``` These problems will not occur with the LKJ prior on the Cholesky factor. I like to use the `coda` package for output analysis. This is the way I use to store the samples in a `coda` object: ```{r} library(coda) codasamples1 <- do.call(mcmc.list, plyr::alply(rstan::extract(samples1, pars=c("sigma", "Omega[1,2]", "Omega[1,3]", "Omega[2,3]"), permuted=FALSE), 2, mcmc)) ``` ## Prior on the Cholesky factor The correlation matrix $\Omega$ has a Cholesky factorization $\Omega = LL'$ where $L$ is a lower triangular matrix. Instead of assigning a prior distribution on $\Omega$, on can assign a prior dsitribution on $L$. By this way, the numerical problems encountered with the previous way are overcome, and this way is also better for a speed perspective. ```{r} stancode2 <- 'data { int N; // number of observations int J; // dimension of observations vector[J] y[N]; // observations vector[J] Zero; // a vector of Zeros (fixed means of observations) } parameters { cholesky_factor_corr[J] Lcorr; vector[J] sigma; } model { y ~ multi_normal_cholesky(Zero, diag_pre_multiply(sigma, Lcorr)); sigma ~ cauchy(0, 5); Lcorr ~ lkj_corr_cholesky(1); } generated quantities { matrix[J,J] Omega; matrix[J,J] Sigma; Omega <- multiply_lower_tri_self_transpose(Lcorr); Sigma <- quad_form_diag(Omega, sigma); }' ``` Note the `generated quantities` block as compared to the `transformed quantities` block in the first code. The objects declared in the `transformed parameters` block are intended to be used in the likelihood of the data, whereas the objects declared in the `generated quantities` block are not. Now we only have to change the `estimates` function: ```{r} estimates <- function(y, perturb=FALSE){ if(perturb) y <- y + rnorm(length(y), 0, 1) Lcorr <- t(chol(cor(y))) sigma <- sqrt(diag(var(y))) return(list(Lcorr=Lcorr, sigma=sigma)) } ``` Then compile and run as before: ```{r compilation2, eval=TRUE, cache=TRUE} stanmodel2 <- stan_model(model_code = stancode2, model_name="stanmodel2") samples2 <- sampling(stanmodel2, data = standata, init=inits, iter = 10000, warmup = 1000, chains = 4) ``` ```{r} library(coda) codasamples2 <- do.call(mcmc.list, plyr::alply(rstan::extract(samples2, pars=c("sigma", "Omega[1,2]", "Omega[1,3]", "Omega[2,3]"), permuted=FALSE), 2, mcmc)) ``` ## Comparison of results Results are almost identical: ```{r} summary(codasamples1) summary(codasamples2) ```