FLOW-MR-tutorial • FLOWMR

Introduction

FLOW-MR is an R package for performing Mendelian Randomization under the mediation setting. This tutorial demonstrates how to use the package step by step.

Installation

# Install devtools if not installed
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}


# Install FLOW-MR
devtools::install_github("ZixuanWu1/FLOW-MR")
#> rlang        (1.1.4    -> 1.1.5   ) [CRAN]
#> cli          (3.6.3    -> 3.6.4   ) [CRAN]
#> pillar       (1.9.0    -> 1.10.1  ) [CRAN]
#> R6           (2.5.1    -> 2.6.1   ) [CRAN]
#> Rcpp         (1.0.13   -> 1.0.14  ) [CRAN]
#> withr        (3.0.1    -> 3.0.2   ) [CRAN]
#> gtable       (0.3.5    -> 0.3.6   ) [CRAN]
#> RcppArmad... (14.0.2-1 -> 14.2.3-1) [CRAN]
#> 
#> The downloaded binary packages are in
#>  /var/folders/1p/kyg0zmtx4gdg27q_5x4d78fc0000gn/T//RtmpL9P6lU/downloaded_packages
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file ‘/private/var/folders/1p/kyg0zmtx4gdg27q_5x4d78fc0000gn/T/RtmpL9P6lU/remotes17574441c9acc/ZixuanWu1-FLOW-MR-888679f/DESCRIPTION’ ... OK
#> * preparing ‘FLOWMR’:
#> * checking DESCRIPTION meta-information ... OK
#> * cleaning src
#> Warning: bad markup (extra space?) at BayesMediation.Rd:48:21
#> Warning: bad markup (extra space?) at BayesMediation.Rd:49:22
#> Warning: bad markup (extra space?) at BayesMediation.Rd:50:26
#> Warning: bad markup (extra space?) at BayesMediation.Rd:52:27
#> Warning: bad markup (extra space?) at BayesMediation.Rd:54:29
#> Warning: bad markup (extra space?) at BayesMediation.Rd:56:30
#> Warning: bad markup (extra space?) at BayesMediation.Rd:58:17
#> Warning: bad markup (extra space?) at BayesMediation.Rd:59:18
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:57:21
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:58:22
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:59:26
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:61:27
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:63:29
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:65:30
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:67:17
#> Warning: bad markup (extra space?) at MultivariateMediation.Rd:68:18
#> Warning: bad markup (extra space?) at gibbs_wrapper.Rd:64:9
#> Warning: bad markup (extra space?) at gibbs_wrapper.Rd:65:9
#> Warning: bad markup (extra space?) at gibbs_wrapper.Rd:66:14
#> Warning: bad markup (extra space?) at gibbs_wrapper.Rd:67:14
#> Warning: bad markup (extra space?) at gibbs_wrapper.Rd:68:10
#> Warning: bad markup (extra space?) at gibbs_wrapper.Rd:69:13
#> Warning: bad markup (extra space?) at gibbs_wrapper_multivariate.Rd:71:9
#> Warning: bad markup (extra space?) at gibbs_wrapper_multivariate.Rd:72:9
#> Warning: bad markup (extra space?) at gibbs_wrapper_multivariate.Rd:73:14
#> Warning: bad markup (extra space?) at gibbs_wrapper_multivariate.Rd:74:14
#> Warning: bad markup (extra space?) at gibbs_wrapper_multivariate.Rd:75:10
#> Warning: bad markup (extra space?) at gibbs_wrapper_multivariate.Rd:76:13
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:24:12
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:25:16
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:26:10
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:27:13
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:28:12
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:29:14
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:30:11
#> Warning: bad markup (extra space?) at summary_gibbs.Rd:31:13
#> Warning: bad markup (extra space?) at zero.centered.em.Rd:42:10
#> Warning: bad markup (extra space?) at zero.centered.em.Rd:43:10
#> Warning: bad markup (extra space?) at zero.centered.em.Rd:44:10
#> * installing the package to process help pages
#> * saving partial Rd database
#> * cleaning src
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘FLOWMR_1.0.tar.gz’

Load the Package

library(FLOWMR)

Example Usage

Step 1: Prepare Input Data

Here, we use GRAPPLE to preprocess the input data. We can download the GRAPPLE package using the following command:

devtools::install_github("jingshuw/grapple")

Next, we use GRAPPLE to preprocess the data following a three-sample Mendelian Randomization (MR) design. The sel.file is used to select genome-wide significant SNPs, the exp.file contains the exposures of interest, and the out.file contains the outcomes of interest. For more details, visit GRAPPLE on GitHub.

In this example we use adult BMI from GIANT and childhoood BMI from EGG as selections files. We use childhood body BMI from MOBA, adult BMI from UK biobank as exposures, and Breast Cancer from this paper as outcome. One can download the traits from this link.

library(GRAPPLE)

# Selection file of snps
sel.file <- c("BMI-giant17eu.csv", "BMIchild-egg15.csv" )

# Exposure file
exp.file <- c( "BMI8year-moba19","BMIadult-ukb.csv" )

# Outcome file
out.file <- "Breast-Micha17erp.csv"

# Use plink to select independent significant SNPs
plink_refdat <- "data_maf0.01_rs_ref/data_maf0.01_rs_ref"

## Use max.p.thres to decide the significance level of SNPs
## Use cal.cor = T to compute the noise correlation
data.list <- GRAPPLE::getInput(sel.file, exp.file, out.file, plink_refdat, max.p.thres =1e-3,
                               plink_exe = "plink_mac_20210606/plink", cal.cor = T)
#> [1] "Marker candidates will not be obtained as number of risk factors k > 1"
#> [1] "loading data for selection: BMI-giant17eu.csv ..."
#> [1] "loading data for selection: BMIchild-egg15.csv ..."
#> [1] "loading data from exposure: BMI8year-moba19 ..."
#> [1] "loading data from exposure: BMIadult-ukb.csv ..."
#> [1] "loading data from outcome: Breast-Micha17erp.csv ..."
#> [1] "Start clumping using PLINK ..."
#> [1] "487 independent genetic instruments extracted. Done!"

Step 2: Run FLOW-MR

To run FLOW-MR, we need to prepare two input files, Gamma_hat and Sd_hat. Also ensuring that the time order is reversed. Then the function BayesMediation will execute the Gibbs sampler.

Here, you can optionally provide cor_mat, a K by K matrix which represents the shared noise correlation between GWAS summary statistics across traits. This is particularly useful when the GWAS summary statistics include overlapping samples. GRAPPLE estimates the noise correlation by first selecting non-significant SNPs (e.g., p-value > 0.5) and then computing the correlation between the estimated effect sizes. The default value is a diagonal matrix.

dat <-data.list$data;

# Run the mediation method
Gamma_hat =rbind(dat$gamma_out1, 
                 dat$gamma_exp2,
                 dat$gamma_exp1)

Sd_hat = rbind(dat$se_out1, 
               dat$se_exp2,
               dat$se_exp1)

cor_mat = data.list$cor.mat[3:1, 3:1]

result = BayesMediation(Gamma_hat, Sd_hat, cor = cor_mat, inv = TRUE)
#> [1] "2025-03-17 17:48:54 CDT"
#> [1] "2025-03-17 17:50:38 CDT"

Step 3: Look at summary of direct effects

In this step, we print the summary of direct effects, where each row corresponds to a parameter.

print(result$summary)
#>                   mean          var           sd         2.5%          50%
#> B[1,2]     0.158571124 1.708075e-02 0.1306933512 -0.088594706  0.155708787
#> B[1,3]    -0.416820534 3.037425e-02 0.1742821007 -0.772970801 -0.408581119
#> B[2,3]     0.729903282 8.361712e-03 0.0914423953  0.564416591  0.726111975
#> sigma      0.432940868 1.258926e-02 0.1122018645  0.273914991  0.412901419
#> sigma1[1]  0.025897252 1.995543e-05 0.0044671501  0.018968703  0.025298456
#> sigma1[2]  0.025400411 5.691362e-05 0.0075441115  0.015802744  0.023753648
#> sigma1[3]  0.046966628 9.573373e-05 0.0097843616  0.032002343  0.045542140
#> sigma0[1]  0.004279803 7.322883e-07 0.0008557385  0.002882453  0.004194274
#> sigma0[2]  0.008336840 7.043863e-07 0.0008392773  0.006698575  0.008356154
#> sigma0[3]  0.010865672 1.488539e-06 0.0012200570  0.008584134  0.010815020
#> p[1]       0.096893667 1.223057e-03 0.0349722279  0.039493403  0.093387522
#> p[2]       0.023916549 3.255310e-04 0.0180424783  0.002547653  0.019647269
#> p[3]       0.042358001 2.525140e-04 0.0158906899  0.017389480  0.040189888
#>                  97.5%        ESS      Rhat
#> B[1,2]     0.421482632  496.41018 1.0033982
#> B[1,3]    -0.086527013  311.31244 1.0034254
#> B[2,3]     0.928726204   60.24373 1.0086724
#> sigma      0.702160361 4455.55279 1.0009391
#> sigma1[1]  0.036117189 1364.89387 1.0042282
#> sigma1[2]  0.044571476 2874.80598 0.9998832
#> sigma1[3]  0.069936393 1211.18638 1.0008464
#> sigma0[1]  0.006155495  178.22450 1.0449098
#> sigma0[2]  0.009872035  364.44844 1.0050532
#> sigma0[3]  0.013334242  371.98046 1.0006414
#> p[1]       0.173356366  648.51063 1.0095333
#> p[2]       0.070259503  779.62068 1.0019857
#> p[3]       0.079158536 1734.45765 1.0011530

In this summary, we provide the estimated posterior mean, variance, standard deviation, 2.5% quantile, median, 97.5% quantile, effective sample size (ESS), and Gelman-Rubin Rhat statistic for each parameter. The ESS quantifies the number of independent samples, while Rhat (the Gelman-Rubin statistic) assesses convergence. See ESS and Rhat for a detailed explanations of ESS and Rhat.

For the meaning of each parameter, B[l, k] represents the direct effect of the $k$ -th latest trait on the $l$ -th latest trait. For example: B[1, 2] represents the direct effect of adult BMI on breast cancer; B[1, 3] represents the direct direct effect of childhood BMI on breast cancer; B[2, 3] represents the direct direct effect of childhood BMI on adult BMI.

See the figure below for a visualization.

knitr::include_graphics("https://zixuanwu1.github.io/FLOW-MR/articles/FLOW-MR-tutorial_files/figure-html/EffectVisual.png")

For the meaning of other parameters, we assume the following priors in our manuscript:

$\alpha_{kj} \sim (1 - p_k)N(0, \sigma_{k0}^2) + p_k N(0, \sigma_{k1}^2)$

$\beta_{kl} \sim N(0, \sigma^2), \quad l > k$

We also include the posterior summary of these parameters. In the summary, sigma1[k] represents the standard deviation of the slab component for trait $k$ , sigma0[k] represents the standard deviation of the spike component for trait $k$ , p[k] represents the proportion of the slab component, and sigma represents the standard deviation of $B$ .

The effects should be considered as significant if its credible interval does not include 0. For example, here B[2, 3] and B[1, 3] are found to be significant, while B[1,2] is not.

Step 4 (Optional) Working with Raw outputs

In most cases, the summary table provides sufficient information. However, in certain situations, it may be useful to directly access the posterior samples, such as when you need to obtain other quantiles. In these cases, you can follow the procedure outlined below.

In the Gibbs sampler, we ran four chains in parallel. To access information from the i-th chain, you can use the command result$raw[[i]]. For example result$raw[[1]]$B is a three dimensional array of dimension K by K by N, where K is the number of traits and N is the number of iterations. Similarly, result$raw[[1]]$sigma1, result$raw[[1]]$sigma0, result$raw[[1]]$p are two dimensional arrays of dimension K by N where K is the number of traits and N is the number of iterations.

To simplify the procedure, one can use the function get_posterior_samples in FLOWMR to extract posterior samples from all four chains for a particular parameter after a user-defined warm-up period. The parameter must be one of “B”, “sigma1”, “sigma0”, “sigma”, “p”.

For example, here we extract the posterior samples for B[1,3] after 3000 warm-up periods

post_B = get_posterior_samples(result$raw, par = "B", K = 3, ind = c(1,3), warmup = 3000)

We can check its quantiles:

quantile(post_B, c(0.025, .5, 0.975))
#>        2.5%         50%       97.5% 
#> -0.77297080 -0.40858112 -0.08652701

And we can also draw a histogram of it for visualization:

hist(post_B, breaks = 30, main = "Histogram of posterior samples of direct effects of X3 on X1")

For sigma1, sgima0 and p, the index argument have to be 1 dimensional. For example

post_p = get_posterior_samples(result$raw, par = "p", K = 3, ind = 1, warmup = 3000)
hist(post_p,  breaks = 30, main = "Histogram of posterior samples of p1")

For sigma, the index argument should be NULL:

post_sigma = get_posterior_samples(result$raw, par = "sigma", K = 3, ind = NULL, warmup = 3000)
hist(post_sigma,  breaks = 30, main = "Histogram of posterior samples of sigma")

Step 5 (Optional) MCMC diagnoistics.

If non-convergence issues arise (e.g., an Rhat value greater than 1.1), you can generate a trace plot of the posterior samples using the function traceplot:

FLOWMR::traceplot(result$raw, par = "B", ind = c(1,2)) # for B

FLOWMR::traceplot(result$raw, par = "sigma1", ind = c(1)) # for sigma1, sigma0, p

The traceplot is a useful tool for diagnosing MCMC convergence (See this in rstan for example link. In the traceplot, each line corresponds to posterior samples from one MCMC chain. Non-convergence is indicated when the posterior samples from the four chains do not appear similar.

If convergence issues arise, consider increasing the number of iterations or adjusting the prior parameters

Step 6 Estimating path-wise / indirect effects

To estimate path-wise effects, use the following command. For example, the function indirect_effect evaluates the indirect effect of childhood BMI on breast cancer through adult BMI:

path_effect = indirect_effect(result$raw, K = 3, path = c(3,2,1),  warmup = 3000)
print(path_effect)
#>        Mean        2.5%       50%     97.5%
#> 1 0.1197818 -0.06143416 0.1103945 0.3452347

The path parameter specifies the path you are interested it. In this case it is 3 -> 2 -> 1.

This can also be done by first extracting the posterior samples of `B[1, 2]’ and ‘B[2, 3]’, then multiply them together

post_B1 = get_posterior_samples(result$raw, par = "B", K = 3, ind = c(1,2), warmup = 3000)
post_B2 = get_posterior_samples(result$raw, par = "B", K = 3, ind = c(2,3), warmup = 3000)
print(quantile(post_B1 * post_B2, c(0.025, .5, .975)))
#>        2.5%         50%       97.5% 
#> -0.06143416  0.11039454  0.34523474

Alternatively, instead of focusing on a specific path, you can estimate the sum of effects from all non-direct paths. The function indirect_effect summarizes the posterior samples of all non-direct path-wise effects. It outputs array df of dimension K by K by 4, where the first two dimensions correspond to the dimensions of B, the third dimension contains the mean, 2.5/% quantile, median, 97.5% quantile respectively.

ind_effect = indirect_effect(result$raw, K = 3, path = "all",  warmup = 3000)
print(ind_effect)
#> , , Mean
#> 
#>      [,1] [,2]      [,3]
#> [1,]    0    0 0.1197818
#> [2,]    0    0 0.0000000
#> [3,]    0    0 0.0000000
#> 
#> , , Quantile_0.025
#> 
#>      [,1] [,2]        [,3]
#> [1,]    0    0 -0.06143416
#> [2,]    0    0  0.00000000
#> [3,]    0    0  0.00000000
#> 
#> , , Quantile_0.5
#> 
#>      [,1] [,2]      [,3]
#> [1,]    0    0 0.1103945
#> [2,]    0    0 0.0000000
#> [3,]    0    0 0.0000000
#> 
#> , , Quantile_0.975
#> 
#>      [,1] [,2]      [,3]
#> [1,]    0    0 0.3452347
#> [2,]    0    0 0.0000000
#> [3,]    0    0 0.0000000

We can also examine the posterior inference for the total effects, which represent the sum of the direct and indirect effects.

tot_effect = total_effect(result$raw, K = 3, warmup = 3000)
print(tot_effect)
#> , , Mean
#> 
#>      [,1]      [,2]       [,3]
#> [1,]    0 0.1585711 -0.2970388
#> [2,]    0 0.0000000  0.7299033
#> [3,]    0 0.0000000  0.0000000
#> 
#> , , Quantile_0.025
#> 
#>      [,1]        [,2]       [,3]
#> [1,]    0 -0.08859471 -0.4794140
#> [2,]    0  0.00000000  0.5644166
#> [3,]    0  0.00000000  0.0000000
#> 
#> , , Quantile_0.5
#> 
#>      [,1]      [,2]       [,3]
#> [1,]    0 0.1557088 -0.2931256
#> [2,]    0 0.0000000  0.7261120
#> [3,]    0 0.0000000  0.0000000
#> 
#> , , Quantile_0.975
#> 
#>      [,1]      [,2]       [,3]
#> [1,]    0 0.4214826 -0.1284121
#> [2,]    0 0.0000000  0.9287262
#> [3,]    0 0.0000000  0.0000000

Finally, we demonstrate how to extract posterior samples of total, indirect, and path-wise effects, similar to how we did for the direct effects.

For each total effect, we use the function get_total_samples.

tot_samples = get_total_samples(result$raw, 3, ind = c(1,3))
hist(tot_samples, breaks = 30,  main = "Histogram of posterior samples of total effects of X3 on X1")

For path-wise effect, we use the function get_indirect_samples.

path_samples = get_indirect_samples(result$raw, 3, path = c(3,2,1))
hist(path_samples, breaks = 30,  main = "Histogram of posterior samples of the effect of X3 on X1 through X2")

For the joint effect of non-direct paths, we still use the function get_indirect_samples. In this case we need to set path to be “all” and specify the index of interests. In this case, since there is only 1 indirect path, the indirect effect will be identical to the path-wise effect we just examined

ind_samples = get_indirect_samples(result$raw, 3, path = "all", ind = c(1,3))
hist(ind_samples,  breaks = 30,  main = "Histogram of posterior samples of indirect effects of X3 on X1")

Conclusion

This tutorial introduced the key functionalities of FLOW-MR. For more details, refer to the package documentation.