To evaluate the enrichment of trait heritability in gene modules by stratified LD score regression (S_LDSC).
I compiled the workflow to perform S-LDSc into a snakemake file. Specifically, there are a few main steps,
Prepare gene sets for each module.
Make annotations for genes in each module by using SNPs within genes and an additional buffer region (100kb) around genes.
Calcualte LD scores for annotations.
Reformat GWAS sum stats.
Calculate the partitioned heritability enrichment for each pair of (module, trait).
Nmodule=166
module_seq=list(range(1, Nmodule+1))
module_seq.remove(4)
module_seq.remove(66)
MODULE=['M' + str(x) for x in module_seq]
CHR=list(range(1, 23))
path='/project2/xuanyao/llw/coloc/ukbb_coloc_blood_traits/ukbb_blood_traits.csv'
with open(path) as f:
lines = [x.split(",")[1] for x in f]
trait_seq=lines[1:]
rule all:
input:
expand('h2_enrich_par/{gwasPhenocode}_{module}_baseline.results', module=MODULE, gwasPhenocode=trait_seq)
rule prep_gene_set:
input:
file_gene_meta='/project2/xuanyao/llw/DGN_no_filter_on_mappability/result/gene.meta.txt'
file_coexp_module='/project2/xuanyao/llw/DGN_no_filter_on_mappability/result/coexp.module.rds'
output:
expand('geneset/{module}.GeneSet', module=MODULE)
script:
'/home/liliw1/Trans/ldsc/1_prep_gene_set.R'
rule make_annot:
input:
expand('geneset/{module}.GeneSet', module=MODULE)
output:
expand('ldsc_annot/{module}.{chr}.annot.gz', chr=CHR, allow_missing=True)
shell:
"""
source activate ldsc
bash /home/liliw1/Trans/ldsc/2_make_annot.sh {wildcards.module}
"""
rule ldsc_annot:
input:
expand('ldsc_annot/{module}.{chr}.annot.gz', chr=CHR, allow_missing=True)
output:
expand('ldsc_annot/{module}.{chr}.l2.ldscore.gz', chr=CHR, allow_missing=True)
shell:
"""
source activate ldsc
bash /home/liliw1/Trans/ldsc/3_ldsc_annot.sh {wildcards.module}
"""
rule prep_gwas:
input:
params:
gwasPhenocode='{gwasPhenocode}'
output:
'gwas/{gwasPhenocode}.tsv.gz'
script:
'/home/liliw1/Trans/ldsc/5_1_prep_gwas.R'
rule convert_gwas:
input:
'gwas/{gwasPhenocode}.tsv.gz'
output:
'gwas/{gwasPhenocode}.sumstats.gz'
shell:
"""
source activate ldsc
bash /home/liliw1/Trans/ldsc/5_2_convert_gwas.sh {wildcards.gwasPhenocode}
"""
rule par_h2:
input:
expand('ldsc_annot/{module}.{chr}.annot.gz', chr=CHR, allow_missing=True),
expand('ldsc_annot/{module}.{chr}.l2.ldscore.gz', chr=CHR, allow_missing=True),
expand('gwas/{gwasPhenocode}.sumstats.gz', gwasPhenocode=trait_seq)
output:
expand('h2_enrich_par/{gwasPhenocode}_{module}_baseline.results', gwasPhenocode=trait_seq, allow_missing=True)
shell:
"""
source activate ldsc
bash /home/liliw1/Trans/ldsc/6_1_ldsc_h2.sh {wildcards.module}
"""
Next, I compiled the S-LDSC results of a module for all traits into one file. In total, there are 166 modules. Here, I take one module (module 4) for an example.
# paras and I/O -----
module <- 4
files_par_h2 <- list.files('/project2/xuanyao/llw/ldsc/h2_enrich_par', paste0("^\\d+_M", module, "_baseline.results"), full.names = TRUE)
file_gwasTraitInfo <- '/project2/xuanyao/llw/coloc/ukbb_coloc_blood_traits/ukbb_blood_traits.csv'
# read data -----
par_h2 <- map_dfr(
files_par_h2,
~fread(cmd = paste("sed -n -e 1p -e 99p", .x), sep = "\t")
)
if(!all(par_h2$Category == "L2_1")) stop("Not all extracted rows are from custom annotation.")
gwasTraitInfo <- fread(file_gwasTraitInfo, sep = ",", header = TRUE)
# re-arrange data -----
# add annotation name & trait info
par_h2 <- par_h2 %>%
mutate(Category = str_extract(basename(!!files_par_h2), paste0("^\\d+_M", !!module))) %>%
separate(Category, c("trait_id", "module"), sep = "_", remove = FALSE, convert = TRUE) %>%
separate(module, c(NA, "module"), sep = "M", convert = TRUE) %>%
left_join(gwasTraitInfo, by = c("trait_id" = "GWAS ID")) %>%
relocate(`GWAS Group`, `GWAS Trait`, `Trait Abbreviation`, .after = trait_id)
The enrichment of module 4 across traits,
filter(par_h2, complete.cases(par_h2)) %>%
select(trait_id, `GWAS Group`, `GWAS Trait`, Enrichment, Enrichment_std_error, Enrichment_p) %>%
arrange(Enrichment_p) %>%
knitr::kable()
| trait_id | GWAS Group | GWAS Trait | Enrichment | Enrichment_std_error | Enrichment_p |
|---|---|---|---|---|---|
| 30100 | Platelets | Mean platelet volume | 6.6765352 | 1.2648330 | 0.0000122 |
| 30080 | Platelets | Platelet count | 3.9722416 | 0.6876781 | 0.0000149 |
| 30110 | Platelets | Platelet distribution width | 6.4866017 | 1.3775005 | 0.0000703 |
| 30000 | White blood cells | White blood cell count | 1.7044407 | 0.2371545 | 0.0031285 |
| 30090 | Platelets | Platelet crit | 2.6319136 | 0.5896392 | 0.0054441 |
| 30070 | Red blood cells | Red blood cell distribution width | 2.4668573 | 0.5518419 | 0.0086231 |
| 30180 | White blood cells | Lymphocyte percentage | 1.9901747 | 0.3821117 | 0.0096199 |
| 30140 | White blood cells | Neutrophill count | 1.7197007 | 0.2894397 | 0.0136088 |
| 30010 | Red blood cells | Red blood cell count | 2.0065541 | 0.4148286 | 0.0150241 |
| 30120 | White blood cells | Lymphocyte count | 1.8146372 | 0.3372063 | 0.0177686 |
| 30250 | Red blood cells | Reticulocyte count | 2.7844407 | 0.7293367 | 0.0202140 |
| 30200 | White blood cells | Neutrophill percentage | 1.9836064 | 0.4284491 | 0.0209660 |
| 30240 | Red blood cells | Reticulocyte percentage | 2.9140927 | 0.7922800 | 0.0219897 |
| 30300 | Red blood cells | High light scatter reticulocyte count | 2.6861645 | 0.7292037 | 0.0274268 |
| 30290 | Red blood cells | High light scatter reticulocyte percentage | 2.7459781 | 0.7675678 | 0.0300567 |
| 30020 | Red blood cells | Haemoglobin concentration | 1.6980012 | 0.3276797 | 0.0317306 |
| 30270 | Red blood cells | Mean sphered cell volume | 1.9486126 | 0.4529099 | 0.0363589 |
| 30030 | Red blood cells | Haematocrit percentage | 1.7133299 | 0.3475986 | 0.0385278 |
| 30260 | Red blood cells | Mean reticulocyte volume | 2.2668847 | 0.6506316 | 0.0490014 |
| 30050 | Red blood cells | Mean corpuscular haemoglobin | 2.7048509 | 0.8835458 | 0.0561511 |
| 30040 | Red blood cells | Mean corpuscular volume | 2.5563837 | 0.8236809 | 0.0617733 |
| 30280 | Red blood cells | Immature reticulocyte fraction | 2.7713588 | 0.9290462 | 0.0666886 |
| 30190 | White blood cells | Monocyte percentage | 1.4886842 | 0.3374385 | 0.1352258 |
| 30130 | White blood cells | Monocyte count | 1.3673806 | 0.2700425 | 0.1624942 |
| 30060 | Red blood cells | Mean corpuscular haemoglobin concentration | 1.7191593 | 0.6399688 | 0.2597777 |
| 30210 | White blood cells | Eosinophill percentage | 0.7715102 | 0.2304061 | 0.3313856 |
| 30150 | White blood cells | Eosinophill count | 0.8082552 | 0.2608841 | 0.4680572 |
| 30220 | White blood cells | Basophill percentage | 1.1889507 | 0.4896314 | 0.6946688 |
| 30160 | White blood cells | Basophill count | 0.8833706 | 0.4909206 | 0.8118197 |
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS/LAPACK: /scratch/midway2/liliw1/conda_env/rstudio-server/lib/libopenblasp-r0.3.18.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.14.2 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
## [5] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7
## [9] ggplot2_3.3.6 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.1 xfun_0.29 bslib_0.3.1 haven_2.4.3
## [5] colorspace_2.0-3 vctrs_0.4.1 generics_0.1.2 htmltools_0.5.2
## [9] yaml_2.2.2 utf8_1.2.2 rlang_1.0.3 jquerylib_0.1.4
## [13] pillar_1.7.0 withr_2.5.0 glue_1.6.2 DBI_1.1.3
## [17] dbplyr_2.1.1 modelr_0.1.8 readxl_1.3.1 lifecycle_1.0.1
## [21] cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 rvest_1.0.2
## [25] evaluate_0.14 knitr_1.37 tzdb_0.2.0 fastmap_1.1.0
## [29] fansi_1.0.3 highr_0.9 broom_0.7.12 Rcpp_1.0.8.3
## [33] backports_1.4.1 scales_1.2.0 jsonlite_1.7.3 fs_1.5.2
## [37] hms_1.1.1 digest_0.6.29 stringi_1.7.6 grid_4.1.2
## [41] cli_3.3.0 tools_4.1.2 magrittr_2.0.3 sass_0.4.0
## [45] crayon_1.5.1 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.3
## [49] reprex_2.0.1 lubridate_1.8.0 assertthat_0.2.1 rmarkdown_2.11
## [53] httr_1.4.2 rstudioapi_0.13 R6_2.5.1 compiler_4.1.2