Last updated: 2023-05-29
Checks: 6 1
Knit directory:
ChromatinSplicingQTLs/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20191126)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d49ae42. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: code/.DS_Store
Ignored: code/.RData
Ignored: code/._report.html
Ignored: code/.ipynb_checkpoints/
Ignored: code/.snakemake/
Ignored: code/APA_Processing/
Ignored: code/Alignments/
Ignored: code/ChromHMM/
Ignored: code/ENCODE/
Ignored: code/ExpressionAnalysis/
Ignored: code/ExtractPhenotypeBedByGenotype.py
Ignored: code/FastqFastp/
Ignored: code/FastqFastpSE/
Ignored: code/FastqSE/
Ignored: code/FineMapping/
Ignored: code/Genotypes/
Ignored: code/H3K36me3_CutAndTag.pdf
Ignored: code/IntronSlopes/
Ignored: code/LR.bed
Ignored: code/LR.seq.bed
Ignored: code/LongReads/
Ignored: code/Metaplots/
Ignored: code/Misc/
Ignored: code/MiscCountTables/
Ignored: code/Multiqc/
Ignored: code/Multiqc_chRNA/
Ignored: code/NonCodingRNA/
Ignored: code/NonCodingRNA_annotation/
Ignored: code/PairwisePi1Traits.P.all.txt.gz
Ignored: code/PeakCalling/
Ignored: code/Phenotypes/
Ignored: code/PlotGruberQTLs/
Ignored: code/PlotQTLs/
Ignored: code/ProCapAnalysis/
Ignored: code/QC/
Ignored: code/QTL_SNP_Enrichment/
Ignored: code/QTLs/
Ignored: code/RPKM_tables/
Ignored: code/ReadLengthMapExperiment/
Ignored: code/ReadLengthMapExperimentResults/
Ignored: code/ReadLengthMapExperimentSpliceCounts/
Ignored: code/ReferenceGenome/
Ignored: code/Rplots.pdf
Ignored: code/Session.vim
Ignored: code/SmallMolecule/
Ignored: code/SplicingAnalysis/
Ignored: code/TODO
Ignored: code/Tehranchi/
Ignored: code/alias/
Ignored: code/bigwigs/
Ignored: code/bigwigs_FromNonWASPFilteredReads/
Ignored: code/config/.DS_Store
Ignored: code/config/._.DS_Store
Ignored: code/config/.ipynb_checkpoints/
Ignored: code/config/config.local.yaml
Ignored: code/dag.pdf
Ignored: code/dag.png
Ignored: code/dag.svg
Ignored: code/data/
Ignored: code/debug.ipynb
Ignored: code/debug_python.ipynb
Ignored: code/deepTools/
Ignored: code/featureCounts/
Ignored: code/featureCountsBasicGtf/
Ignored: code/genome_config.yaml
Ignored: code/gwas_summary_stats/
Ignored: code/hyprcoloc/
Ignored: code/igv_session.xml
Ignored: code/isoseqbams/
Ignored: code/log
Ignored: code/logs/
Ignored: code/notebooks/.ipynb_checkpoints/
Ignored: code/pi1/
Ignored: code/polyA.Splicing.Subset_YRI.NominalPassForColoc.bed.bgz
Ignored: code/rules/.ipynb_checkpoints/
Ignored: code/rules/OldRules/
Ignored: code/rules/notebooks/
Ignored: code/salmontest/
Ignored: code/scratch/
Ignored: code/scripts/.Plot_mbv.R.swp
Ignored: code/scripts/.ipynb_checkpoints/
Ignored: code/scripts/GTFtools_0.8.0/
Ignored: code/scripts/__pycache__/
Ignored: code/scripts/liftOverBedpe/liftOverBedpe.py
Ignored: code/snakemake.dryrun.log
Ignored: code/snakemake.log
Ignored: code/snakemake.sbatch.log
Ignored: code/snakemake_profiles/slurm/__pycache__/
Ignored: code/test.introns.bed
Ignored: code/test.introns2.bed
Ignored: code/test.log
Ignored: code/tracks.xml
Ignored: data/.DS_Store
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-10.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-11.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-2.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-3.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-4.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-5.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-6.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-7.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-8.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022.csv
Ignored: data/Metaplots/.DS_Store
Untracked files:
Untracked: analysis/20230528_IdentifyBrianaPacBio.Rmd
Untracked: code/scripts/Plot_mbv_NoPredictedID.R
Unstaged changes:
Modified: analysis/MakeFinalFigs_Fig2.Rmd
Modified: code/Snakefile
Modified: code/config/ColocRunWildcards.tsv
Modified: code/rules/LongReads.smk
Modified: code/scripts/Plot_mbv.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/MakeFinalFigs_Fig2.Rmd
)
and HTML (docs/MakeFinalFigs_Fig2.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b8b6d6d | Benjmain Fair | 2023-05-24 | add new long table |
Rmd | 5946b5a | Benjmain Fair | 2023-05-22 | update site |
knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.7 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(RColorBrewer)
library(data.table)
Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':
between, first, last
The following object is masked from 'package:purrr':
transpose
library(readxl)
library(qvalue)
library(ggrepel)
library(knitr)
library(ggbreak)
ggbreak v0.1.1
If you use ggbreak in published research, please cite the following
paper:
S Xu, M Chen, T Feng, L Zhan, L Zhou, G Yu. Use ggbreak to effectively
utilize plotting space to deal with large datasets and outliers.
Frontiers in Genetics. 2021, 12:774846. doi: 10.3389/fgene.2021.774846
# Set theme
theme_set(
theme_classic() +
theme(text=element_text(size=16, family="Helvetica")))
# I use layer a lot, to rotate long x-axis labels
Rotate_x_labels <- theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
PhenotypeWildcardsAll <- Sys.glob("../code/QTLs/QTLTools/*/PermutationPass.FDR_Added.txt.gz") %>%
str_replace("../code/QTLs/QTLTools/(.+?)/PermutationPass.FDR_Added.txt.gz", "\\1")
PhenotypeWildcards_ToIncluide <- c("APA_Nuclear", "APA_Total", "Expression.Splicing.Subset_YRI", "Expression.Splicing", "H3K27AC", "H3K36ME3", "H3K4ME3", "H3K4ME1", "MetabolicLabelled.30min", "MetabolicLabelled.60min", "chRNA.Expression.Splicing", "chRNA.Splicing", "polyA.Splicing", "polyA.Splicing.Subset_YRI", "MetabolicLabelled.30min.Splicing", "MetabolicLabelled.60min.Splicing")
GroupedPhenotypeWildcards_ToIncluide <- c( "chRNA.Splicing", "polyA.Splicing.Subset_YRI", "polyA.Splicing", "MetabolicLabelled.30min.Splicing", "MetabolicLabelled.60min.Splicing")
UngroupedPhenotypeWildcards_ToIncluide <- setdiff(PhenotypeWildcards_ToIncluide, GroupedPhenotypeWildcards_ToIncluide)
Grouped.dat <- setNames(paste0("../code/QTLs/QTLTools/",GroupedPhenotypeWildcards_ToIncluide,"/GroupedPermutationPass.FDR_Added.txt.gz"), GroupedPhenotypeWildcards_ToIncluide) %>%
lapply(fread) %>%
bind_rows(.id="PC")
Ungrouped.dat <- setNames(paste0("../code/QTLs/QTLTools/",UngroupedPhenotypeWildcards_ToIncluide,"/PermutationPass.FDR_Added.txt.gz"), UngroupedPhenotypeWildcards_ToIncluide) %>%
lapply(fread) %>%
bind_rows(.id="PC")
Full.dat <- bind_rows(
Grouped.dat, Ungrouped.dat
)
First plot number of individuals for each assay
SampleSize <- read_tsv("../code/QC/SampleSize.PerPhenotype.tsv", col_names=c("fn", "dummy", "n")) %>%
mutate(PC = str_replace(fn,"QTLs/QTLTools/(.+?)/OnlyFirstReps.sorted.qqnorm.bed.gz", "\\1")) %>%
filter(PC %in% c("Expression.Splicing", "Expression.Splicing.Subset_YRI", "chRNA.Expression.Splicing", "MetabolicLabelled.30min", "MetabolicLabelled.60min", "H3K27AC", "H3K4ME3", "H3K4ME1", "H3K36ME3", "APA_Nuclear", "APA_Total"))
SampleSize %>%
dplyr::select(PC, n) %>%
kable()
PC | n |
---|---|
Expression.Splicing | 462 |
Expression.Splicing.Subset_YRI | 89 |
chRNA.Expression.Splicing | 86 |
MetabolicLabelled.30min | 66 |
MetabolicLabelled.60min | 66 |
H3K27AC | 72 |
H3K4ME3 | 72 |
H3K4ME1 | 72 |
H3K36ME3 | 94 |
APA_Nuclear | 52 |
APA_Total | 52 |
SourcePublications <- setNames(c("Lappalainen et al", "Lappalainen et al", "This study", "Li et al", "Li et al", "Grubert et al", "Grubert et al", "Grubert et al", "This study", "Mittleman et al", "Mittleman et al"), SampleSize$PC)
Labels = setNames(c("polyA RNA-seq*", "polyA RNA-seq", "chRNA-seq", "30m 4sU RNA-seq", "60m 4sU RNA-seq", "H3K4me1 ChIP-seq", "H3K27Ac ChIP-seq", "H3K4me3 ChIP-seq", "H3K36me3 Cut&Tag", "nuclear RNA 3'seq", "Total RNA 3'seq"), SampleSize$PC)
SampleSizes <- SampleSize %>%
filter(!PC %in% "Expression.Splicing.Subset_YRI") %>%
mutate(publication = recode(PC, !!!SourcePublications)) %>%
mutate(Assay = factor(PC, levels=c("H3K4ME1", "H3K27AC", "H3K4ME3", "H3K36ME3", "chRNA.Expression.Splicing", "APA_Nuclear", "APA_Total", "MetabolicLabelled.30min","MetabolicLabelled.60min", "Expression.Splicing"))) %>%
ggplot(aes(x=Assay, y=n)) +
geom_col() +
# geom_text(aes(label=n), color="black") +
geom_text(aes(label=publication),angle=90, color="black", y=0, hjust=-0.05) +
scale_x_discrete(labels=Labels) +
scale_y_continuous(expand=c(0,0)) +
scale_y_break(c(100,425), scales="fixed", ticklabels=c(425, 450, 475), expand=F, space=0.4) +
Rotate_x_labels +
labs(x="Assay", y="Sample size", caption=str_wrap("*Includes samples non-YRI donors, though some supplementary analyses only utilize the 89 YRI samples in this dataset", 30)) +
theme(axis.text.x=element_text(size=rel(0.5)))
P.SampleSizes <- ggplotify::as.ggplot(print(SampleSizes))
P.SampleSizes
Now number of QTLs
relabel.df <- read_tsv("../data/Phenotypes_recode_for_Plotting.txt")
relabel.df %>% head()
# A tibble: 6 × 9
PC Alias ShorterAlias ShorterAlias2 Shortest Include Assay Alias_202305
<chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
1 APA_Nucl… APA_… APA_Nuclear APA APA QTL… TRUE Nucl… APA (Nuclea…
2 APA_Total APA_… APA_Total APA APA QTL… TRUE Tota… APA (Total …
3 chRNA.Ex… Expr… ncRNA_chRNA ncRNA_chRNA <NA> TRUE <NA> <NA>
4 chRNA.Ex… Expr… ncRNA_chRNA ncRNA_chRNA <NA> FALSE <NA> <NA>
5 chRNA.Ex… chRN… ncRNA_chRNA ncRNA_chRNA <NA> FALSE <NA> <NA>
6 chRNA.Ex… chRN… ncRNA_chRNA ncRNA_chRNA <NA> FALSE <NA> <NA>
# … with 1 more variable: OrderFactor <dbl>
Full.dat$PC %>% unique()
[1] "chRNA.Splicing" "polyA.Splicing.Subset_YRI"
[3] "polyA.Splicing" "MetabolicLabelled.30min.Splicing"
[5] "MetabolicLabelled.60min.Splicing" "APA_Nuclear"
[7] "APA_Total" "Expression.Splicing.Subset_YRI"
[9] "Expression.Splicing" "H3K27AC"
[11] "H3K36ME3" "H3K4ME3"
[13] "H3K4ME1" "MetabolicLabelled.30min"
[15] "MetabolicLabelled.60min" "chRNA.Expression.Splicing"
MolQTL_Labels <- relabel.df %>% dplyr::select(PC, Shortest) %>% deframe()
MolQTL_AssayLabels <- relabel.df %>% dplyr::select(PC, Assay) %>% deframe()
Factor_Order <- relabel.df %>% dplyr::select(PC, OrderFactor) %>% deframe()
NumQTLs.dat <- Full.dat %>%
filter(PC %in% c("APA_Total", "APA_Nuclear")) %>%
separate(phe_id, into=c("gene", "region", "peak"), sep="_") %>%
arrange(adj_emp_pval) %>%
group_by(gene) %>% slice(1) %>%
bind_rows(
Full.dat %>%
filter(!PC %in% c("APA_Total", "APA_Nuclear"))
) %>%
group_by(PC) %>%
summarise(
FDR_10 = sum(q<0.1, na.rm=T),
FDR_05 = sum(q<0.05, na.rm=T),
FDR_01 = sum(q<0.01, na.rm=T),
TestFeats=n()) %>%
gather("FDR", "n", -PC, -TestFeats) %>%
mutate(PC_relabeled = recode(PC, !!!MolQTL_Labels)) %>%
mutate(Assay_relabeled = recode(PC, !!!MolQTL_AssayLabels)) %>%
mutate(FinalLabels = paste(PC_relabeled, Assay_relabeled, sep="\n")) %>%
mutate(OrderFactor = recode(PC, !!!Factor_Order)) %>%
mutate(FinalLabels = fct_reorder(FinalLabels, OrderFactor))
Num.QTLs <- ggplot(NumQTLs.dat, aes(x=FinalLabels, y=n, fill=FDR)) +
geom_col(position="identity", width=0.8) +
Rotate_x_labels +
scale_y_continuous(trans="log10", breaks=10**(1:5), labels=c(10, 100, "1K", "10K", "100K")) +
# geom_text_repel(
# data = . %>%
# gather("FDR","n",-PC),
# aes(label=n, y=n), angle=90, hjust=-0.05,
# min.segment.length = unit(0, 'lines'),
# color="black") +
geom_text(
data = . %>%
filter(FDR== "FDR_10"),
aes(label=str_glue("{n} / {TestFeats}")), angle=90, hjust=0,
color="black") +
coord_cartesian(ylim=c(100,1E5)) +
scale_fill_manual(values=c("FDR_10"="#ffeda0", "FDR_05"="#feb24c", "FDR_01"="#f03b20"), labels=c("FDR_10"="10%", "FDR_05"="5%", "FDR_01"="1%")) +
# scale_x_discrete(labels=c())
labs(x="molQTL type, assay", y="Number QTLs", fill="FDR", caption=str_wrap("*Each sQTL/APA-QTL count here represents a likely independent genetic effect that may effect multiple introns/APA-sites within a local leafcutter cluster of introns/APA-sites", 40)) +
theme(axis.text.x = element_text(size=9, hjust=1, angle=70))
Num.QTLs
NumQTLs.dat %>% kable()
PC | TestFeats | FDR | n | PC_relabeled | Assay_relabeled | FinalLabels | OrderFactor |
---|---|---|---|---|---|---|---|
APA_Nuclear | 4941 | FDR_10 | 277 | APA QTLs* | Nuclear RNA 3’seq | APA QTLs* | |
Nuclear RNA 3’seq | 7.0 | ||||||
APA_Total | 2997 | FDR_10 | 133 | APA QTLs* | Total RNA 3’seq | APA QTLs* | |
Total RNA 3’seq | 14.0 | ||||||
Expression.Splicing | 14000 | FDR_10 | 11966 | Expression; eQTLs | polyA RNA-seq | Expression; eQTLs | |
polyA RNA-seq | 10.0 | ||||||
Expression.Splicing.Subset_YRI | 14000 | FDR_10 | 3970 | Expression; eQTLs (YRI only) | polyA RNA-seq | Expression; eQTLs (YRI only) | |
polyA RNA-seq | 11.0 | ||||||
H3K27AC | 100858 | FDR_10 | 10222 | H3K27Ac; hQTLs | H3K27Ac ChIP-seq | H3K27Ac; hQTLs | |
H3K27Ac ChIP-seq | 2.0 | ||||||
H3K36ME3 | 14000 | FDR_10 | 754 | H3K36me3; hQTLs | H3K36me3 ChIP-seq | H3K36me3; hQTLs | |
H3K36me3 ChIP-seq | 4.0 | ||||||
H3K4ME1 | 182749 | FDR_10 | 6328 | H3K4me1; hQTLs | H3K4me1 ChIP-seq | H3K4me1; hQTLs | |
H3K4me1 ChIP-seq | 1.0 | ||||||
H3K4ME3 | 55769 | FDR_10 | 4602 | H3K4me3; hQTLs | H3K4me3 ChIP-seq | H3K4me3; hQTLs | |
H3K4me3 ChIP-seq | 3.0 | ||||||
MetabolicLabelled.30min | 14000 | FDR_10 | 955 | Expression; eQTLs | 30min 4sU RNA-seq | Expression; eQTLs | |
30min 4sU RNA-seq | 8.5 | ||||||
MetabolicLabelled.30min.Splicing | 35099 | FDR_10 | 390 | Splicing | 30min 4sU RNA-seq | Splicing | |
30min 4sU RNA-seq | 8.0 | ||||||
MetabolicLabelled.60min | 13993 | FDR_10 | 960 | Expression; eQTLs | 60min 4sU RNA-seq | Expression; eQTLs | |
60min 4sU RNA-seq | 9.0 | ||||||
MetabolicLabelled.60min.Splicing | 35648 | FDR_10 | 464 | Splicing | 60min 4sU RNA-seq | Splicing | |
60min 4sU RNA-seq | 9.5 | ||||||
chRNA.Expression.Splicing | 14000 | FDR_10 | 2976 | Expression; eQTLs | chRNA-seq | Expression; eQTLs | |
chRNA-seq | 5.0 | ||||||
chRNA.Splicing | 41406 | FDR_10 | 3052 | Splicing; sQTLs* | chRNA-seq | Splicing; sQTLs* | |
chRNA-seq | 6.0 | ||||||
polyA.Splicing | 38280 | FDR_10 | 8637 | Splicing; sQTLs* | polyA RNA-seq | Splicing; sQTLs* | |
polyA RNA-seq | 12.0 | ||||||
polyA.Splicing.Subset_YRI | 38280 | FDR_10 | 2295 | Splicing; sQTLs* (YRI only) | polyA RNA-seq | Splicing; sQTLs* (YRI only) | |
polyA RNA-seq | 13.0 | ||||||
APA_Nuclear | 4941 | FDR_05 | 217 | APA QTLs* | Nuclear RNA 3’seq | APA QTLs* | |
Nuclear RNA 3’seq | 7.0 | ||||||
APA_Total | 2997 | FDR_05 | 84 | APA QTLs* | Total RNA 3’seq | APA QTLs* | |
Total RNA 3’seq | 14.0 | ||||||
Expression.Splicing | 14000 | FDR_05 | 10403 | Expression; eQTLs | polyA RNA-seq | Expression; eQTLs | |
polyA RNA-seq | 10.0 | ||||||
Expression.Splicing.Subset_YRI | 14000 | FDR_05 | 2834 | Expression; eQTLs (YRI only) | polyA RNA-seq | Expression; eQTLs (YRI only) | |
polyA RNA-seq | 11.0 | ||||||
H3K27AC | 100858 | FDR_05 | 7380 | H3K27Ac; hQTLs | H3K27Ac ChIP-seq | H3K27Ac; hQTLs | |
H3K27Ac ChIP-seq | 2.0 | ||||||
H3K36ME3 | 14000 | FDR_05 | 540 | H3K36me3; hQTLs | H3K36me3 ChIP-seq | H3K36me3; hQTLs | |
H3K36me3 ChIP-seq | 4.0 | ||||||
H3K4ME1 | 182749 | FDR_05 | 4383 | H3K4me1; hQTLs | H3K4me1 ChIP-seq | H3K4me1; hQTLs | |
H3K4me1 ChIP-seq | 1.0 | ||||||
H3K4ME3 | 55769 | FDR_05 | 3246 | H3K4me3; hQTLs | H3K4me3 ChIP-seq | H3K4me3; hQTLs | |
H3K4me3 ChIP-seq | 3.0 | ||||||
MetabolicLabelled.30min | 14000 | FDR_05 | 596 | Expression; eQTLs | 30min 4sU RNA-seq | Expression; eQTLs | |
30min 4sU RNA-seq | 8.5 | ||||||
MetabolicLabelled.30min.Splicing | 35099 | FDR_05 | 274 | Splicing | 30min 4sU RNA-seq | Splicing | |
30min 4sU RNA-seq | 8.0 | ||||||
MetabolicLabelled.60min | 13993 | FDR_05 | 640 | Expression; eQTLs | 60min 4sU RNA-seq | Expression; eQTLs | |
60min 4sU RNA-seq | 9.0 | ||||||
MetabolicLabelled.60min.Splicing | 35648 | FDR_05 | 362 | Splicing | 60min 4sU RNA-seq | Splicing | |
60min 4sU RNA-seq | 9.5 | ||||||
chRNA.Expression.Splicing | 14000 | FDR_05 | 2039 | Expression; eQTLs | chRNA-seq | Expression; eQTLs | |
chRNA-seq | 5.0 | ||||||
chRNA.Splicing | 41406 | FDR_05 | 2345 | Splicing; sQTLs* | chRNA-seq | Splicing; sQTLs* | |
chRNA-seq | 6.0 | ||||||
polyA.Splicing | 38280 | FDR_05 | 7344 | Splicing; sQTLs* | polyA RNA-seq | Splicing; sQTLs* | |
polyA RNA-seq | 12.0 | ||||||
polyA.Splicing.Subset_YRI | 38280 | FDR_05 | 1766 | Splicing; sQTLs* (YRI only) | polyA RNA-seq | Splicing; sQTLs* (YRI only) | |
polyA RNA-seq | 13.0 | ||||||
APA_Nuclear | 4941 | FDR_01 | 114 | APA QTLs* | Nuclear RNA 3’seq | APA QTLs* | |
Nuclear RNA 3’seq | 7.0 | ||||||
APA_Total | 2997 | FDR_01 | 32 | APA QTLs* | Total RNA 3’seq | APA QTLs* | |
Total RNA 3’seq | 14.0 | ||||||
Expression.Splicing | 14000 | FDR_01 | 8172 | Expression; eQTLs | polyA RNA-seq | Expression; eQTLs | |
polyA RNA-seq | 10.0 | ||||||
Expression.Splicing.Subset_YRI | 14000 | FDR_01 | 1581 | Expression; eQTLs (YRI only) | polyA RNA-seq | Expression; eQTLs (YRI only) | |
polyA RNA-seq | 11.0 | ||||||
H3K27AC | 100858 | FDR_01 | 4394 | H3K27Ac; hQTLs | H3K27Ac ChIP-seq | H3K27Ac; hQTLs | |
H3K27Ac ChIP-seq | 2.0 | ||||||
H3K36ME3 | 14000 | FDR_01 | 292 | H3K36me3; hQTLs | H3K36me3 ChIP-seq | H3K36me3; hQTLs | |
H3K36me3 ChIP-seq | 4.0 | ||||||
H3K4ME1 | 182749 | FDR_01 | 2422 | H3K4me1; hQTLs | H3K4me1 ChIP-seq | H3K4me1; hQTLs | |
H3K4me1 ChIP-seq | 1.0 | ||||||
H3K4ME3 | 55769 | FDR_01 | 1903 | H3K4me3; hQTLs | H3K4me3 ChIP-seq | H3K4me3; hQTLs | |
H3K4me3 ChIP-seq | 3.0 | ||||||
MetabolicLabelled.30min | 14000 | FDR_01 | 285 | Expression; eQTLs | 30min 4sU RNA-seq | Expression; eQTLs | |
30min 4sU RNA-seq | 8.5 | ||||||
MetabolicLabelled.30min.Splicing | 35099 | FDR_01 | 160 | Splicing | 30min 4sU RNA-seq | Splicing | |
30min 4sU RNA-seq | 8.0 | ||||||
MetabolicLabelled.60min | 13993 | FDR_01 | 302 | Expression; eQTLs | 60min 4sU RNA-seq | Expression; eQTLs | |
60min 4sU RNA-seq | 9.0 | ||||||
MetabolicLabelled.60min.Splicing | 35648 | FDR_01 | 217 | Splicing | 60min 4sU RNA-seq | Splicing | |
60min 4sU RNA-seq | 9.5 | ||||||
chRNA.Expression.Splicing | 14000 | FDR_01 | 1162 | Expression; eQTLs | chRNA-seq | Expression; eQTLs | |
chRNA-seq | 5.0 | ||||||
chRNA.Splicing | 41406 | FDR_01 | 1661 | Splicing; sQTLs* | chRNA-seq | Splicing; sQTLs* | |
chRNA-seq | 6.0 | ||||||
polyA.Splicing | 38280 | FDR_01 | 5684 | Splicing; sQTLs* | polyA RNA-seq | Splicing; sQTLs* | |
polyA RNA-seq | 12.0 | ||||||
polyA.Splicing.Subset_YRI | 38280 | FDR_01 | 1178 | Splicing; sQTLs* (YRI only) | polyA RNA-seq | Splicing; sQTLs* (YRI only) | |
polyA RNA-seq | 13.0 |
NumQTLs.dat %>%
filter(FDR=="FDR_10") %>%
pull(n) %>% sum()
[1] 57981
NumQTLs.dat %>%
filter(FDR=="FDR_10") %>%
pull(TestFeats) %>% sum()
[1] 620020
first do some quick data searching to figure out exactly the names of the phenotypes I want to make boxpltos for…
dat <- fread("../code/pi1/PairwisePi1Traits.P.all.txt.gz")
PeaksToTSS <- Sys.glob("../code/Misc/PeaksClosestToTSS/*_assigned.tsv.gz") %>%
setNames(str_replace(., "../code/Misc/PeaksClosestToTSS/(.+?)_assigned.tsv.gz", "\\1")) %>%
lapply(read_tsv) %>%
bind_rows(.id="ChromatinMark") %>%
mutate(GenePeakPair = paste(gene, peak, sep = ";")) %>%
distinct(ChromatinMark, peak, gene, .keep_all=T)
PeaksToTSS %>%
filter(ChromatinMark == "H3K4ME3") %>%
filter(str_detect(gene, "ENSG00000075234"))
dat %>%
filter(PC1 == "Expression.Splicing") %>%
filter(str_detect(P1, "ENSG00000075234")) %>%
# pull(PC2) %>% unique()
filter(PC2 == "polyA.Splicing")
dat %>%
filter(PC1 == "Expression.Splicing") %>%
filter(str_detect(P1, "ENSG00000075234")) %>%
filter(P2 %in% c("H3K4ME3_peak_31179", "22:46292328:46292791:clu_48515_+"))
So I’m looking to make boxplots for H3K4ME3_peak_31179 (hQTL), ENSG00000075234.17 (polyA eQTL), and chRNA sQTL and polyA sQTL for 22:46292328:46292791:clu_48515_+
TopSNP: “22:46291323:G:A”
Now use my helper script to get genotype/phenotype data for the top eSNP for those traits..
cd ../code/scripts/GenometracksByGenotype
python ExtractPhenotypeBedByGenotype.py --Bed ../../QTLs/QTLTools/Expression.Splicing/OnlyFirstReps.sorted.qqnorm.bed.gz --VCF ../../Genotypes/1KG_GRCh38/Autosomes.vcf.gz --FeatureName ENSG00000075234.17 --SnpPos chr22:46291323 > ../../scratch/Boxplot.dat.eQTL.tsv
python ExtractPhenotypeBedByGenotype.py --Bed ../../QTLs/QTLTools/polyA.Splicing/OnlyFirstReps.sorted.qqnorm.bed.gz --VCF ../../Genotypes/1KG_GRCh38/Autosomes.vcf.gz --FeatureName 22:46292328:46292791:clu_48515_+ --SnpPos chr22:46291323 > ../../scratch/Boxplot.dat.polyAsQTL.tsv
python ExtractPhenotypeBedByGenotype.py --Bed ../../QTLs/QTLTools/polyA.Splicing/OnlyFirstReps.sorted.qqnorm.bed.gz --VCF ../../Genotypes/1KG_GRCh38/Autosomes.vcf.gz --FeatureName 22:46292328:46292791:clu_48515_+ --SnpPos chr22:46291323 > ../../scratch/Boxplot.dat.polyAsQTL.tsv
python ExtractPhenotypeBedByGenotype.py --Bed ../../QTLs/QTLTools/H3K4ME3/OnlyFirstReps.sorted.qqnorm.bed.gz --VCF ../../Genotypes/1KG_GRCh38/Autosomes.vcf.gz --FeatureName H3K4ME3_peak_31179 --SnpPos chr22:46291323 > ../../scratch/Boxplot.dat.PromoterhQTL.tsv
Now let’s read in that data and do boxplots…
dat.for.boxplots <- Sys.glob("../code/scratch/Boxplot.dat.*.tsv") %>%
setNames(str_replace(., "../code/scratch/Boxplot.dat.(.+?).tsv", "\\1")) %>%
lapply(read_tsv, col_names=c("SampleID", "P", "genotype", "Ref", "Alt", "ID"), skip=1) %>%
bind_rows(.id="PC")
dat.for.boxplots %>%
filter(!genotype==3) %>%
mutate(genotype= factor(genotype, levels=c(0,1,2))) %>%
ggplot(aes(x=genotype, y=P, group=genotype)) +
geom_boxplot(outlier.shape = NA, aes(color=genotype)) +
# geom_jitter(width=0.25, alpha=0.1) +
geom_smooth(method='lm', aes(group=1), se=F, color="gray") +
scale_x_discrete(labels=c("0"="GG", "1"="GA", "2"="AA")) +
scale_color_manual(values=c("0"="#f03b20", "1"="#7a0177", "2"="#08519c")) +
# scale_x_discrete() +
facet_wrap(~PC) +
theme(legend.position="none") +
labs(x=NULL, color=NULL)
dat.for.boxplots$PC %>% unique()
dat.for.boxplots %>%
filter(!genotype==3) %>%
filter(PC=="PromoterhQTL") %>%
mutate(genotype= factor(genotype, levels=c(0,1,2))) %>%
ggplot(aes(x=genotype, y=P, group=genotype)) +
geom_boxplot(outlier.shape = NA, aes(color=genotype)) +
# geom_jitter(width=0.25, alpha=0.1) +
geom_smooth(method='lm', aes(group=1), se=F, color="gray") +
scale_x_discrete(labels=c("0"="GG", "1"="GA", "2"="AA")) +
scale_color_manual(values=c("0"="#f03b20", "1"="#7a0177", "2"="#08519c")) +
geom_text(x=-Inf, y=Inf, label=" beta=0.15\n P=0.48", hjust=0, vjust=1, size=5.5) +
# scale_x_discrete() +
theme(legend.position="none") +
Rotate_x_labels +
labs(x=NULL, color=NULL, y="H3K4me3 @ promoter")
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig2_ExampleBoxplot_hQTL.pdf", height=3, width=2)
dat.for.boxplots %>%
filter(!genotype==3) %>%
filter(PC=="eQTL") %>%
mutate(genotype= factor(genotype, levels=c(0,1,2))) %>%
ggplot(aes(x=genotype, y=P, group=genotype)) +
geom_boxplot(outlier.shape = NA, aes(color=genotype)) +
# geom_jitter(width=0.25, alpha=0.1) +
geom_smooth(method='lm', aes(group=1), se=F, color="gray") +
scale_x_discrete(labels=c("0"="GG", "1"="GA", "2"="AA")) +
scale_color_manual(values=c("0"="#f03b20", "1"="#7a0177", "2"="#08519c")) +
geom_text(x=-Inf, y=Inf, label=" beta=1.1\n P<5E-43", hjust=0, vjust=1, size=5.5) +
# scale_x_discrete() +
theme(legend.position="none") +
Rotate_x_labels +
labs(x=NULL, color=NULL, y="TTC38 expression")
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig2_ExampleBoxplot_eQTL.pdf", height=3, width=2)
dat.for.boxplots %>%
filter(!genotype==3) %>%
filter(PC=="polyAsQTL") %>%
mutate(genotype= factor(genotype, levels=c(0,1,2))) %>%
ggplot(aes(x=genotype, y=P, group=genotype)) +
geom_boxplot(outlier.shape = NA, aes(color=genotype)) +
# geom_jitter(width=0.25, alpha=0.1) +
geom_smooth(method='lm', aes(group=1), se=F, color="gray") +
scale_x_discrete(labels=c("0"="GG", "1"="GA", "2"="AA")) +
scale_color_manual(values=c("0"="#f03b20", "1"="#7a0177", "2"="#08519c")) +
geom_text(x=-Inf, y=Inf, label=" beta=-1.1\n P<3e-24", hjust=0, vjust=1, size=5.5) +
# scale_x_discrete() +
theme(legend.position="none") +
Rotate_x_labels +
labs(x=NULL, color=NULL, y="Splicing of cassette exon")
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig2_ExampleBoxplot_polyAsQTL.pdf", height=3, width=2)
dat.for.boxplots %>%
filter(!genotype==3) %>%
filter(PC=="chRNAsQTL") %>%
mutate(genotype= factor(genotype, levels=c(0,1,2))) %>%
ggplot(aes(x=genotype, y=P, group=genotype)) +
geom_boxplot(outlier.shape = NA, aes(color=genotype)) +
# geom_jitter(width=0.25, alpha=0.1) +
geom_smooth(method='lm', aes(group=1), se=F, color="gray") +
scale_x_discrete(labels=c("0"="GG", "1"="GA", "2"="AA")) +
scale_color_manual(values=c("0"="#f03b20", "1"="#7a0177", "2"="#08519c")) +
geom_text(x=-Inf, y=Inf, label=" beta=-1.3\n P<4e-19", hjust=0, vjust=1, size=5.5) +
# scale_x_discrete() +
theme(legend.position="none") +
Rotate_x_labels +
labs(x=NULL, color=NULL, y="Splicing of cassette exon")
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig2_ExampleBoxplot_chRNAsQTL.pdf", height=3, width=2)
Actually a supplement related to fig1
juncs.long <- fread("../code/SplicingAnalysis/CombinedJuncTables/YRI.tsv.gz")
IntronAnnotations <- read_tsv("../data/IntronAnnotationsFromYang.tsv.gz") %>%
mutate(stop = end-1)
juncs.long.summary <- juncs.long %>%
dplyr::select(chrom, start, stop, strand, Dataset, Count) %>%
inner_join(IntronAnnotations) %>%
group_by(Dataset, SemiSupergroupAnnotations, SuperAnnotation) %>%
summarise(Sum=sum(Count))
P.PercentAnnotated <- juncs.long.summary %>%
ungroup() %>%
filter(Dataset %in% c("Expression.Splicing", "chRNA.Expression.Splicing")) %>%
filter(!str_detect(SuperAnnotation, "Noncoding")) %>%
group_by(Dataset, SuperAnnotation) %>%
summarise(TotalSum = sum(Sum)) %>%
ungroup() %>%
mutate(PlotGroup = if_else(SuperAnnotation=="AnnotatedJunc_ProductiveCodingGene", "Annotated productive (basic tag)", "Other")) %>%
group_by(Dataset) %>%
mutate(DatasetSum = sum(TotalSum)) %>%
ungroup() %>%
group_by(Dataset, PlotGroup) %>%
summarise(Percent=sum(TotalSum)/DatasetSum) %>%
ungroup() %>%
distinct() %>%
mutate(Dataset = factor(Dataset, levels=c("Expression.Splicing", "chRNA.Expression.Splicing"))) %>%
ggplot(aes(x=Dataset, y=Percent*100, fill=PlotGroup)) +
geom_col() +
scale_x_discrete(labels=c("Expression.Splicing"="polyA RNA", "chRNA.Expression.Splicing"="chRNA")) +
scale_y_continuous(expand=c(0,0)) +
scale_fill_manual(values=c("Annotated productive (basic tag)"="#1f78b4", "Other"="#969696")) +
scale_y_break(c(5, 95), expand=F, space=0.5) +
labs(y="Percent of splice junction reads", fill="Intron classification") +
theme(legend.position="none")
P.PercentAnnotated <- ggplotify::as.ggplot(print(P.PercentAnnotated))
P.PercentAnnotated
P.PercentOfUnannotated <- juncs.long.summary %>%
ungroup() %>%
filter(Dataset %in% c("chRNA.Expression.Splicing")) %>%
filter(!str_detect(SuperAnnotation, "Noncoding")) %>%
filter(SuperAnnotation!="AnnotatedJunc_ProductiveCodingGene") %>%
mutate(PercentOfNon_AnnotatedProductive = Sum/sum(Sum)*100) %>%
arrange(desc(SuperAnnotation), PercentOfNon_AnnotatedProductive) %>%
mutate(n=row_number()) %>%
mutate(SemiSupergroupAnnotations=fct_reorder(SemiSupergroupAnnotations, n)) %>%
ggplot(aes(x=Dataset, y=PercentOfNon_AnnotatedProductive, fill=SemiSupergroupAnnotations)) +
geom_col()
P.PercentOfUnannotated
One more version… with fixed scale… borrowed code from here
position_stack_and_nudge <- function(x = 0, y = 0, vjust = 1, reverse = FALSE) {
ggproto(NULL, PositionStackAndNudge,
x = x,
y = y,
vjust = vjust,
reverse = reverse
)
}
#' @rdname ggplot2-ggproto
#' @format NULL
#' @usage NULL
#' @noRd
PositionStackAndNudge <- ggproto("PositionStackAndNudge", PositionStack,
x = 0,
y = 0,
setup_params = function(self, data) {
c(
list(x = self$x, y = self$y),
ggproto_parent(PositionStack, self)$setup_params(data)
)
},
compute_layer = function(self, data, params, panel) {
# operate on the stacked positions (updated in August 2020)
data = ggproto_parent(PositionStack, self)$compute_layer(data, params, panel)
x_orig <- data$x
y_orig <- data$y
# transform only the dimensions for which non-zero nudging is requested
if (any(params$x != 0)) {
if (any(params$y != 0)) {
data <- transform_position(data, function(x) x + params$x, function(y) y + params$y)
} else {
data <- transform_position(data, function(x) x + params$x, NULL)
}
} else if (any(params$y != 0)) {
data <- transform_position(data, function(x) x, function(y) y + params$y)
}
data$nudge_x <- data$x
data$nudge_y <- data$y
data$x <- x_orig
data$y <- y_orig
data
},
compute_panel = function(self, data, params, scales) {
ggproto_parent(PositionStack, self)$compute_panel(data, params, scales)
}
)
AnnotationColors <- IntronAnnotations %>%
distinct(SemiSupergroupAnnotations, SuperAnnotation) %>%
mutate(Color = recode(SuperAnnotation, !!!c(
"AnnotatedJunc_NoncodingGene"="#6a3d9a",
"UnannotatedJunc_NoncodingGene"="#cab2d6",
"AnnotatedJunc_ProductiveCodingGene"="#1f78b4",
"UnannotatedJunc_ProductiveCodingGene"="#a6cee3",
"AnnotatedJunc_UnproductiveCodingGene"="#e31a1c",
"UnannotatedJunc_UnproductiveCodingGene"="#fb9a99")))
P.PercentOfUnannotated <- juncs.long.summary %>%
ungroup() %>%
filter(Dataset %in% c("chRNA.Expression.Splicing")) %>%
filter(!str_detect(SuperAnnotation, "Noncoding")) %>%
filter(SuperAnnotation!="AnnotatedJunc_ProductiveCodingGene") %>%
mutate(PercentOfNon_AnnotatedProductive = Sum/sum(Sum)*100) %>%
mutate(SemiSupergroupAnnotations=if_else(Sum <= 996286 & SuperAnnotation=="UnannotatedJunc_UnproductiveCodingGene", "Other predicted non-productive", SemiSupergroupAnnotations)) %>%
group_by(Dataset, SemiSupergroupAnnotations, SuperAnnotation) %>%
summarise(PercentOfNon_AnnotatedProductive=sum(PercentOfNon_AnnotatedProductive)) %>%
ungroup() %>%
arrange(desc(SuperAnnotation), PercentOfNon_AnnotatedProductive) %>%
mutate(n=row_number()) %>%
mutate(SemiSupergroupAnnotations=fct_reorder(SemiSupergroupAnnotations, n)) %>%
ggplot(aes(x=Dataset, y=PercentOfNon_AnnotatedProductive, fill=SemiSupergroupAnnotations)) +
geom_col(color='black') +
geom_text_repel(
aes(label = SemiSupergroupAnnotations, x=1.4),
max.overlaps = Inf,
min.segment.length=0,
label.padding=1,
position = position_stack_and_nudge(vjust = 0.5, x = 1),
direction = 'both',
arrow=NULL
) +
scale_y_continuous(expand=c(0,0)) +
scale_x_discrete(expand=c(0,0)) +
scale_fill_manual(values=AnnotationColors %>% dplyr::select(SemiSupergroupAnnotations, Color) %>% deframe(), na.value="#fb9a99") +
theme(legend.position='none') +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
labs(y="Percent junction reads amongst other")
P.PercentOfUnannotated
P.PercentJuncsLegend.dat <-
AnnotationColors %>%
distinct(SuperAnnotation, .keep_all=T) %>%
filter(!str_detect(SuperAnnotation, "Noncoding"))
P.PercentJuncsLegend <- ggplot(P.PercentJuncsLegend.dat, aes(x=SuperAnnotation, y=1, fill=SuperAnnotation)) +
geom_col() +
scale_fill_manual(values=P.PercentJuncsLegend.dat %>%
dplyr::select(SuperAnnotation, Color) %>% deframe(), labels=c("AnnotatedJunc_UnproductiveCodingGene"="Annotated unproductive", "AnnotatedJunc_ProductiveCodingGene"="Annotated productive", "UnannotatedJunc_ProductiveCodingGene"="Unannotated unproductive", "UnannotatedJunc_UnproductiveCodingGene"="Unannotated productive")) +
labs(fill="Intron classification")
Number eQTLs per gene, hQTLs, sQTLs
dat <- fread("../code/pi1/PairwisePi1Traits.P.all.txt.gz")
dat$PC2 %>% unique()
[1] "Expression.Splicing.Subset_YRI" "chRNA.Expression.Splicing"
[3] "MetabolicLabelled.30min" "MetabolicLabelled.60min"
[5] "H3K27AC" "H3K4ME3"
[7] "H3K4ME1" "H3K36ME3"
[9] "APA_Nuclear" "APA_Total"
[11] "polyA.Splicing" "polyA.Splicing.Subset_YRI"
[13] "chRNA.Splicing" "ProCap"
[15] "Expression.Splicing"
NumMolQTLsPerGene <- dat %>%
filter(PC1=="Expression.Splicing" & FDR.x < 0.1) %>%
mutate(P2 = case_when(
PC2 %in% c("polyA.Splicing", "chRNA.Splicing", "polyA.Splicing.Subset_YRI") ~ str_extract(P2, "clu_.+$"),
TRUE ~ P2
)) %>%
group_by(PC1, PC2, P2, GeneLocus) %>%
slice(which.min(FDR.y)) %>%
ungroup() %>%
mutate(IsPc2SigFDR = FDR.y < 0.1) %>%
mutate(IsPc2SigPermutationP = p_permutation.y < 0.01) %>%
group_by(PC2, GeneLocus) %>%
summarise(Num_PC2_QTLs_FDR = sum(IsPc2SigFDR),
Num_PC2_QTLs_PermutationP = sum(IsPc2SigPermutationP)) %>%
ungroup()
NumMolQTLsPerGene %>%
filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
ggplot(aes(x=Num_PC2_QTLs_PermutationP)) +
stat_ecdf() +
facet_wrap(~PC2) +
coord_cartesian(xlim=c(0,5))
NumMolQTLsPerGene %>%
filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
ggplot(aes(x=Num_PC2_QTLs_FDR)) +
stat_ecdf() +
facet_wrap(~PC2) +
coord_cartesian(xlim=c(0,5))
NumMolQTLsPerGene %>%
ggplot(aes(x=Num_PC2_QTLs_PermutationP)) +
stat_ecdf() +
facet_wrap(~PC2) +
coord_cartesian(xlim=c(0,5))
Ok, now let’s plot that nicer for saving…
eGenes <- dat %>%
filter(PC1=="Expression.Splicing" & FDR.x < 0.1) %>%
pull(P1) %>% unique()
P.Number_hQTLsForColoc <- NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="H3K27AC") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n) %>%
ggplot(aes(x=n, y=nn)) +
geom_col() +
scale_x_continuous(breaks=0:5, labels=c(0:4, ">4")) +
labs(x="Number H3K27AC hQTLs w/in 100kb of gene", y="Number eGenes")
NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="H3K27AC") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n)
# A tibble: 6 × 3
PC2 n nn
<chr> <int> <int>
1 H3K27AC 0 4241
2 H3K27AC 1 3313
3 H3K27AC 2 1942
4 H3K27AC 3 1064
5 H3K27AC 4 608
6 H3K27AC 5 756
P.Number_sQTLsForColoc <- NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="polyA.Splicing") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n) %>%
ggplot(aes(x=n, y=nn)) +
geom_col() +
scale_x_continuous(breaks=0:5, labels=c(0:4, ">4")) +
labs(x="Number sQTLs within gene", y="Number eGenes")
NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="polyA.Splicing") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n)
# A tibble: 6 × 3
PC2 n nn
<chr> <int> <int>
1 polyA.Splicing 0 4895
2 polyA.Splicing 1 2926
3 polyA.Splicing 2 1022
4 polyA.Splicing 3 305
5 polyA.Splicing 4 104
6 polyA.Splicing 5 124
P.Number_hQTLsForColoc
P.Number_sQTLsForColoc
Same plots, but only considering eGenes discovered in YRI subset
eGenes <- dat %>%
filter(PC1=="Expression.Splicing.Subset_YRI" & FDR.x < 0.1) %>%
pull(P1) %>% unique()
P.Number_hQTLsForColoc <- NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="H3K27AC") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n) %>%
ggplot(aes(x=n, y=nn)) +
geom_col() +
scale_x_continuous(breaks=0:5, labels=c(0:4, ">4")) +
scale_y_continuous(expand=c(0,0)) +
labs(x="Number H3K27AC hQTLs w/in 100kb of gene", y="Number eGenes")
NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="H3K27AC") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n)
# A tibble: 6 × 3
PC2 n nn
<chr> <int> <int>
1 H3K27AC 0 1193
2 H3K27AC 1 1032
3 H3K27AC 2 668
4 H3K27AC 3 374
5 H3K27AC 4 238
6 H3K27AC 5 375
P.Number_sQTLsForColoc <- NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="polyA.Splicing") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n) %>%
ggplot(aes(x=n, y=nn)) +
geom_col() +
scale_x_continuous(breaks=0:5, labels=c(0:4, ">4")) +
scale_y_continuous(expand=c(0,0)) +
labs(x="Number sQTLs within gene", y="Number eGenes")
NumMolQTLsPerGene %>%
filter(GeneLocus %in% eGenes) %>%
# filter(PC2 %in% c("H3K27AC", "polyA.Splicing")) %>%
filter(PC2=="polyA.Splicing") %>%
mutate(n = if_else(Num_PC2_QTLs_PermutationP>5, as.integer(5), Num_PC2_QTLs_PermutationP)) %>%
count(PC2, n)
# A tibble: 6 × 3
PC2 n nn
<chr> <int> <int>
1 polyA.Splicing 0 1411
2 polyA.Splicing 1 1050
3 polyA.Splicing 2 467
4 polyA.Splicing 3 168
5 polyA.Splicing 4 68
6 polyA.Splicing 5 85
P.Number_hQTLsForColoc
P.Number_sQTLsForColoc
coloc.dat <- read_tsv("../code/hyprcoloc/Results/ForColoc/MolColocNonRedundantFullSplicing/tidy_results_OnlyColocalized.txt.gz") %>%
separate(phenotype_full, into=c("PC", "P"), sep=";")
Dat.TopQTLInFamily <- dat %>%
filter(PC1=="Expression.Splicing" & FDR.x < 0.1) %>%
mutate(P2 = case_when(
PC2 %in% c("polyA.Splicing", "chRNA.Splicing", "polyA.Splicing.Subset_YRI") ~ str_extract(P2, "clu_.+$"),
TRUE ~ P2
)) %>%
group_by(PC1, PC2, P2, GeneLocus) %>%
slice(which.min(FDR.y)) %>%
ungroup() %>%
group_by(PC1, PC2, GeneLocus) %>%
slice(which.min(FDR.y)) %>%
ungroup()
coloc.dat %>%
mutate(P = case_when(
PC %in% c("polyA.Splicing", "chRNA.Splicing", "polyA.Splicing.Subset_YRI") ~ str_extract(P, "clu_.+$"),
TRUE ~ P
)) %>%
left_join(
Dat.TopQTLInFamily %>%
dplyr::select(Locus=GeneLocus, PC=PC2, P=P2) %>%
mutate(TopMolQTLAroundGene = T)
) %>%
replace_na(list(TopMolQTLAroundGene=F)) %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI")) %>%
ungroup() %>%
group_by(Locus, PC) %>%
summarise(ContainsTopMolQTL = any(TopMolQTLAroundGene)) %>%
ungroup() %>%
group_by(PC) %>%
summarise(FracContainsTop = sum(ContainsTopMolQTL)/n()) %>%
ungroup()
# A tibble: 21 × 2
PC FracContainsTop
<chr> <dbl>
1 APA_Nuclear 0.939
2 APA_Total 1
3 CTCF 0
4 Expression.Splicing.Subset_YRI 0.999
5 H3K27AC 0.701
6 H3K36ME3 1
7 H3K4ME1 0.612
8 H3K4ME3 0.763
9 MetabolicLabelled.30min 1
10 MetabolicLabelled.30min.IER 0
# … with 11 more rows
Same analysis without renaming to match all in cluster
Dat.TopQTLInFamily2 <- dat %>%
filter(PC1=="Expression.Splicing" & FDR.x < 0.1) %>%
# mutate(P2 = case_when(
# PC2 %in% c("polyA.Splicing", "chRNA.Splicing", "polyA.Splicing.Subset_YRI") ~ str_extract(P2, "clu_.+$"),
# TRUE ~ P2
# )) %>%
# group_by(PC1, PC2, P2, GeneLocus) %>%
# slice(which.min(FDR.y)) %>%
# ungroup() %>%
group_by(PC1, PC2, GeneLocus) %>%
slice(which.min(FDR.y)) %>%
ungroup()
dat.FractionNonPrimaryMolQTLColocs <- coloc.dat %>%
# mutate(P = case_when(
# PC %in% c("polyA.Splicing", "chRNA.Splicing", "polyA.Splicing.Subset_YRI") ~ str_extract(P, "clu_.+$"),
# TRUE ~ P
# )) %>%
left_join(
Dat.TopQTLInFamily2 %>%
dplyr::select(Locus=GeneLocus, PC=PC2, P=P2) %>%
mutate(TopMolQTLAroundGene = T)
) %>%
replace_na(list(TopMolQTLAroundGene=F)) %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI")) %>%
ungroup() %>%
group_by(Locus, PC) %>%
summarise(ContainsTopMolQTL = any(TopMolQTLAroundGene)) %>%
ungroup() %>%
group_by(PC) %>%
summarise(FracContainsTop = sum(ContainsTopMolQTL)/n()) %>%
ungroup()
kable(dat.FractionNonPrimaryMolQTLColocs)
PC | FracContainsTop |
---|---|
APA_Nuclear | 0.9393939 |
APA_Total | 1.0000000 |
CTCF | 0.0000000 |
Expression.Splicing.Subset_YRI | 0.9992717 |
H3K27AC | 0.7008850 |
H3K36ME3 | 1.0000000 |
H3K4ME1 | 0.6124661 |
H3K4ME3 | 0.7628866 |
MetabolicLabelled.30min | 1.0000000 |
MetabolicLabelled.30min.IER | 0.0000000 |
MetabolicLabelled.30min.Splicing | 0.0000000 |
MetabolicLabelled.60min | 1.0000000 |
MetabolicLabelled.60min.IER | 0.0000000 |
MetabolicLabelled.60min.Splicing | 0.0000000 |
ProCap | 0.7437811 |
chRNA.Expression.Splicing | 1.0000000 |
chRNA.Expression_ncRNA | 0.0000000 |
chRNA.IER | 0.0000000 |
chRNA.Splicing | 0.7853403 |
polyA.IER | 0.0000000 |
polyA.Splicing | 0.7064057 |
P.NonPrimaryMolQTLs_OftenColoc <- dat.FractionNonPrimaryMolQTLColocs %>%
filter(PC %in% c("H3K27AC", "polyA.Splicing")) %>%
mutate(Complement = 1-FracContainsTop) %>%
gather("ContainsTop", "Fraction",-PC) %>%
ggplot(aes(x=PC, y=Fraction, fill=ContainsTop)) +
geom_col(position="fill") +
scale_y_continuous(limits=c(0, 1), expand=c(0,0)) +
scale_fill_manual(values=c("FracContainsTop"="#636363", "Complement"="#f0f0f0"), labels=c("eGene colocalizes with top molQTL", "colocalizes with non top molQTL")) +
scale_x_discrete(labels=c("H3K27AC hQTL", "polyA RNA sQTL")) +
labs(x=NULL, y=str_wrap("Fraction of eQTL/molQTL colocalizations", 30), fill=NULL) +
Rotate_x_labels
P.NonPrimaryMolQTLs_OftenColoc
eGenes <- dat %>%
filter(PC1=="Expression.Splicing.Subset_YRI" & FDR.x < 0.1) %>%
pull(P1) %>% unique()
coloc.dat$PC %>% unique()
[1] "polyA.Splicing" "MetabolicLabelled.30min"
[3] "MetabolicLabelled.60min" "H3K27AC"
[5] "H3K4ME3" "H3K4ME1"
[7] "chRNA.IER" "ProCap"
[9] "CTCF" "chRNA.Expression_ncRNA"
[11] "Expression.Splicing.Subset_YRI" "chRNA.Splicing"
[13] "MetabolicLabelled.30min.Splicing" "MetabolicLabelled.60min.Splicing"
[15] "chRNA.Expression.Splicing" "H3K36ME3"
[17] "APA_Nuclear" "APA_Total"
[19] "polyA.IER" "MetabolicLabelled.30min.IER"
[21] "MetabolicLabelled.60min.IER"
coloc.dat %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI")) %>%
ungroup() %>%
count(Locus, PC) %>%
count(PC) %>%
mutate(PC_relabeled = recode(PC, !!!MolQTL_Labels)) %>%
filter(!is.na(PC_relabeled) | PC=="Any hQTL") %>%
filter(!PC_relabeled %in% c("Splicing", "Expression; eQTLs (YRI only)")) %>%
mutate(Assay_relabeled = recode(PC, !!!MolQTL_AssayLabels)) %>%
mutate(FinalLabels = paste(PC_relabeled, Assay_relabeled, sep="\n")) %>%
mutate(OrderFactor = recode(PC, !!!Factor_Order)) %>%
mutate(FinalLabels = fct_reorder(FinalLabels, OrderFactor)) %>%
ggplot(aes(x=FinalLabels, y=n)) +
geom_col() +
scale_y_continuous(
name="Number eGenes with colocalized molQTL",
sec.axis = sec_axis(~ . / length(eGenes), name = "% eGenes with colocalized molQTL")
) +
Rotate_x_labels +
labs(x="Molecular QTL type, Assay")
dat.eGeneColocsWith <- bind_rows(
coloc.dat %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI")) %>%
ungroup() %>%
count(Locus, PC) %>%
count(PC),
coloc.dat %>%
mutate(PC = case_when(
PC %in% c("H3K4ME1", "H3K4ME3", "H3K27AC", "H3K36ME3") ~ "Any hQTL",
TRUE ~ PC
)) %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI")) %>%
ungroup() %>%
count(Locus, PC) %>%
count(PC)
) %>%
mutate(PC_relabeled = recode(PC, !!!MolQTL_Labels)) %>%
filter(!is.na(PC_relabeled) | PC=="Any hQTL") %>%
filter(!PC_relabeled %in% c("Splicing", "Expression; eQTLs (YRI only)")) %>%
mutate(Assay_relabeled = recode(PC, !!!MolQTL_AssayLabels)) %>%
mutate(FinalLabels = paste(PC_relabeled, Assay_relabeled, sep="\n")) %>%
mutate(OrderFactor = recode(PC, !!!Factor_Order))
P.eGeneColocsWith <- dat.eGeneColocsWith %>%
mutate(FinalLabels = str_remove_all(FinalLabels, "\\*")) %>%
mutate(FinalLabels = case_when(
FinalLabels == "Any hQTL\nAny hQTL" ~ "Any hQTL*",
TRUE ~ FinalLabels
)) %>%
mutate(OrderFactor = case_when(
FinalLabels == "Any hQTL*" ~ 4.5,
TRUE ~ OrderFactor
)) %>%
mutate(FinalLabels = fct_reorder(FinalLabels, OrderFactor)) %>%
distinct(.keep_all=T) %>%
ggplot(aes(x=FinalLabels, y=n)) +
geom_col() +
scale_y_continuous(
name="Number eGenes with colocalized molQTL",
sec.axis = sec_axis(~ . / length(eGenes), name = "% eGenes with colocalized molQTL")
) +
Rotate_x_labels +
labs(x="Molecular QTL type, Assay")
P.eGeneColocsWith
dat.eGeneColocsWith
# A tibble: 19 × 6
PC n PC_relabeled Assay_relabeled FinalLabels OrderFactor
<chr> <int> <chr> <chr> <chr> <dbl>
1 APA_Nuclear 33 APA QTLs* Nuclear RNA 3'… "APA QTLs*… 7
2 APA_Total 38 APA QTLs* Total RNA 3'seq "APA QTLs*… 14
3 H3K27AC 565 H3K27Ac; hQ… H3K27Ac ChIP-s… "H3K27Ac; … 2
4 H3K36ME3 284 H3K36me3; h… H3K36me3 ChIP-… "H3K36me3;… 4
5 H3K4ME1 369 H3K4me1; hQ… H3K4me1 ChIP-s… "H3K4me1; … 1
6 H3K4ME3 485 H3K4me3; hQ… H3K4me3 ChIP-s… "H3K4me3; … 3
7 MetabolicLabelled… 433 Expression;… 30min 4sU RNA-… "Expressio… 8.5
8 MetabolicLabelled… 456 Expression;… 60min 4sU RNA-… "Expressio… 9
9 chRNA.Expression.… 647 Expression;… chRNA-seq "Expressio… 5
10 chRNA.Splicing 191 Splicing; s… chRNA-seq "Splicing;… 6
11 polyA.Splicing 562 Splicing; s… polyA RNA-seq "Splicing;… 12
12 APA_Nuclear 33 APA QTLs* Nuclear RNA 3'… "APA QTLs*… 7
13 APA_Total 38 APA QTLs* Total RNA 3'seq "APA QTLs*… 14
14 Any hQTL 828 Any hQTL Any hQTL "Any hQTL\… NA
15 MetabolicLabelled… 433 Expression;… 30min 4sU RNA-… "Expressio… 8.5
16 MetabolicLabelled… 456 Expression;… 60min 4sU RNA-… "Expressio… 9
17 chRNA.Expression.… 647 Expression;… chRNA-seq "Expressio… 5
18 chRNA.Splicing 191 Splicing; s… chRNA-seq "Splicing;… 6
19 polyA.Splicing 562 Splicing; s… polyA RNA-seq "Splicing;… 12
eGenes.NohQTLColoc <-
coloc.dat %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI") & !any(PC %in% c("H3K4ME3", "H3K27AC", "H3K4ME1", "H3K36ME3"))) %>%
pull(Locus) %>% unique()
dat.eGeneColocsWith.NohQTL <-
coloc.dat %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI") & !any(PC %in% c("H3K4ME3", "H3K27AC", "H3K4ME1", "H3K36ME3"))) %>%
ungroup() %>%
count(Locus, PC) %>%
count(PC) %>%
mutate(PC_relabeled = recode(PC, !!!MolQTL_Labels)) %>%
filter(!is.na(PC_relabeled) | PC=="Any hQTL") %>%
filter(!PC_relabeled %in% c("Splicing", "Expression; eQTLs (YRI only)")) %>%
mutate(Assay_relabeled = recode(PC, !!!MolQTL_AssayLabels)) %>%
mutate(FinalLabels = paste(PC_relabeled, Assay_relabeled, sep="\n")) %>%
mutate(OrderFactor = recode(PC, !!!Factor_Order))
P.eGeneColocsWith.nohQTL <- dat.eGeneColocsWith.NohQTL %>%
mutate(FinalLabels = str_remove_all(FinalLabels, "\\*")) %>%
mutate(FinalLabels = case_when(
FinalLabels == "Any hQTL\nAny hQTL" ~ "Any hQTL*",
TRUE ~ FinalLabels
)) %>%
mutate(OrderFactor = case_when(
FinalLabels == "Any hQTL*" ~ 4.5,
TRUE ~ OrderFactor
)) %>%
distinct(.keep_all=T) %>%
mutate(FinalLabels = fct_reorder(FinalLabels, OrderFactor)) %>%
ggplot(aes(x=FinalLabels, y=n)) +
geom_col() +
scale_y_continuous(
name="Number eGenes with colocalized molQTL",
sec.axis = sec_axis(~ . / length(eGenes.NohQTLColoc), name = "% eGenes with colocalized molQTL\nAmong those that did not coloc with hQTL")
) +
Rotate_x_labels +
labs(x="Molecular QTL type, Assay")
P.eGeneColocsWith.nohQTL
dat.eGeneColocsWith.NohQTL
# A tibble: 7 × 6
PC n PC_relabeled Assay_relabeled FinalLabels OrderFactor
<chr> <int> <chr> <chr> <chr> <dbl>
1 APA_Nuclear 16 APA QTLs* Nuclear RNA 3'… "APA QTLs*… 7
2 APA_Total 21 APA QTLs* Total RNA 3'seq "APA QTLs*… 14
3 MetabolicLabelled.… 153 Expression;… 30min 4sU RNA-… "Expressio… 8.5
4 MetabolicLabelled.… 162 Expression;… 60min 4sU RNA-… "Expressio… 9
5 chRNA.Expression.S… 213 Expression;… chRNA-seq "Expressio… 5
6 chRNA.Splicing 96 Splicing; s… chRNA-seq "Splicing;… 6
7 polyA.Splicing 258 Splicing; s… polyA RNA-seq "Splicing;… 12
data.frame(name="Non hQTL", n=length(eGenes.NohQTLColoc))
name n
1 Non hQTL 545
dat.SimplifiedNumberColocs <- bind_rows(
dat.eGeneColocsWith %>%
filter(PC=="Any hQTL") %>%
mutate(name="hQTL*", order=1) %>%
dplyr::select(name, n, order),
# dat.eGeneColocsWith %>%
# filter(PC=="polyA.Splicing") %>%
# mutate(name="sQTL", order=1.5) %>%
# dplyr::select(name, n, order),
dat.eGeneColocsWith.NohQTL %>%
filter(PC=="APA_Total") %>%
mutate(name="APA QTL**", order=4) %>%
dplyr::select(name, n, order),
dat.eGeneColocsWith.NohQTL %>%
filter(PC=="polyA.Splicing") %>%
mutate(name="sQTL**", order=3) %>%
dplyr::select(name, n, order),
) %>%
add_row(name="Any molQTL**", n=length(eGenes.NohQTLColoc), order=2) %>%
mutate(name = fct_reorder(name, order))
P.SimplifiedNumberColocs <- ggplot(dat.SimplifiedNumberColocs, aes(x=name, y=n)) +
geom_col(fill="#bdbdbd") +
scale_y_continuous(expand=c(0,0), breaks=c(0, 200, 400, 600, 800), labels=c(0, "", "", "", 800)) +
geom_text(aes(label=name), y=10, angle=90, y=-Inf, hjust=0, size=8) +
labs(x=NULL, y="# eGenes coloc w/ molQTL") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
P.SimplifiedNumberColocs
kable(dat.SimplifiedNumberColocs)
name | n | order |
---|---|---|
hQTL* | 828 | 1 |
APA QTL** | 21 | 4 |
sQTL** | 258 | 3 |
Any molQTL** | 545 | 2 |
IntronAnnotations <- read_tsv("../data/IntronAnnotationsFromYang.tsv.gz")
ClusterLevelQTLAnnotations <- read_delim("../code/QTLs/QTLTools/polyA.Splicing/PermutationPassForColoc.txt.gz", delim=' ') %>%
mutate(IntronID = str_replace(phe_id, "(^.+?clu.+?):EN.+$", "\\1")) %>%
separate(IntronID, into=c("chrom", "start", "end", "cluster"), remove=F, convert=T, sep=":") %>%
mutate(chrom=paste0("chr", chrom)) %>%
mutate(q = qvalue(adj_beta_pval)$qvalues) %>%
left_join(IntronAnnotations) %>%
group_by(cluster) %>%
mutate(ContainsUnproductive = any(SuperAnnotation %in% c("AnnotatedJunc_UnproductiveCodingGene", "UnannotatedJunc_UnproductiveCodingGene") & q<0.1)) %>%
ungroup()
ClusterLevelQTLAnnotations %>%
distinct(cluster, .keep_all=T) %>%
count(ContainsUnproductive)
# A tibble: 2 × 2
ContainsUnproductive n
<lgl> <int>
1 FALSE 31259
2 TRUE 4490
dat %>%
filter(PC1=="Expression.Splicing" & FDR.x < 0.1) %>%
filter(PC2=="polyA.Splicing" & p_permutation.y<0.01)
PC1 P1 GeneLocus
1: Expression.Splicing ENSG00000198793.13 ENSG00000198793.13
2: Expression.Splicing ENSG00000127481.15 ENSG00000127481.15
3: Expression.Splicing ENSG00000127481.15 ENSG00000127481.15
4: Expression.Splicing ENSG00000160094.15 ENSG00000160094.15
5: Expression.Splicing ENSG00000116922.14 ENSG00000116922.14
---
15055: Expression.Splicing ENSG00000197128.11 ENSG00000197128.11
15056: Expression.Splicing ENSG00000197128.11 ENSG00000197128.11
15057: Expression.Splicing ENSG00000197128.11 ENSG00000197128.11
15058: Expression.Splicing ENSG00000197128.11 ENSG00000197128.11
15059: Expression.Splicing ENSG00000197128.11 ENSG00000197128.11
p_permutation.x beta.x beta_se.x singletrait_topvar.x
1: 1.91282e-02 -0.119659 0.0283057 1:11324733:C:T
2: 4.07612e-05 0.125678 0.0219205 1:19123840:T:C
3: 4.07612e-05 0.125678 0.0219205 1:19123840:T:C
4: 2.74415e-02 0.181904 0.0442285 1:33257610:G:A
5: 4.57030e-13 0.332516 0.0384069 1:37683521:C:T
---
15055: 8.41409e-30 0.677943 0.0490979 19:57468799:C:T
15056: 8.41409e-30 0.677943 0.0490979 19:57468799:C:T
15057: 8.41409e-30 0.677943 0.0490979 19:57468799:C:T
15058: 8.41409e-30 0.677943 0.0490979 19:57468799:C:T
15059: 8.41409e-30 0.677943 0.0490979 19:57468799:C:T
singletrait_topvar_chr.x singletrait_topvar_pos.x FDR.x
1: chr1 11324733 6.570641e-03
2: chr1 19123840 2.177979e-05
3: chr1 19123840 2.177979e-05
4: chr1 33257610 9.127945e-03
5: chr1 37683521 5.503866e-13
---
15055: chr19 57468799 3.275936e-29
15056: chr19 57468799 3.275936e-29
15057: chr19 57468799 3.275936e-29
15058: chr19 57468799 3.275936e-29
15059: chr19 57468799 3.275936e-29
PC2 P2 p_permutation.y
1: polyA.Splicing 1:11213566:11228668:clu_262_- 4.49478e-03
2: polyA.Splicing 1:19151859:19152313:clu_356_- 8.08673e-03
3: polyA.Splicing 1:19151804:19152313:clu_356_- 2.36350e-03
4: polyA.Splicing 1:33256654:33270487:clu_3059_+ 3.48650e-03
5: polyA.Splicing 1:37689883:37690002:clu_644_- 4.23258e-03
---
15055: polyA.Splicing 19:57475786:57476380:clu_44453_- 7.50154e-03
15056: polyA.Splicing 19:57475146:57475660:clu_44453_- 5.51692e-03
15057: polyA.Splicing 19:57467085:57476634:clu_44453_- 5.08816e-05
15058: polyA.Splicing 19:57456182:57476634:clu_44453_- 1.47938e-05
15059: polyA.Splicing 19:57455743:57476634:clu_44453_- 5.45558e-06
beta.y beta_se.y singletrait_topvar.y singletrait_topvar_chr.y
1: 0.371906 0.0695477 1:11264089:CA:C chr1
2: -1.516750 0.2875090 1:19304026:G:C chr1
3: 0.877172 0.1666940 1:19264724:CTT:C chr1
4: 1.162510 0.2540520 1:33239322:G:C chr1
5: 0.349353 0.0767724 1:37681494:T:C chr1
---
15055: -0.209531 0.0459838 19:57449940:G:A chr19
15056: 0.312532 0.0719654 19:57469325:A:G chr19
15057: -0.279717 0.0478831 19:57450880:A:AC chr19
15058: -0.258663 0.0458791 19:57449940:G:A chr19
15059: -0.259654 0.0430860 19:57449940:G:A chr19
singletrait_topvar_pos.y FDR.y trait.x.p.in.y x.beta.in.y
1: 11264085 0.0453688625 3.25581e-01 0.05805960
2: 19304026 0.0731896350 8.65210e-01 0.01348910
3: 19264723 0.0264276619 4.43321e-01 -0.03353030
4: 33239322 0.0368014521 8.87217e-01 -0.00999069
5: 37681494 0.0431957225 1.39465e-04 0.28945300
---
15055: 57449940 0.0689080148 3.01964e-03 -0.14960100
15056: 57469325 0.0535979864 9.87470e-04 -0.21649800
15057: 57450880 0.0008509990 1.05411e-07 -0.27950300
15058: 57449940 0.0002733075 2.71867e-06 -0.23710600
15059: 57449940 0.0001085746 3.63617e-09 -0.27939000
x.beta_se.in.y
1: 0.0589955
2: 0.0794210
3: 0.0437003
4: 0.0704026
5: 0.0753311
---
15055: 0.0501670
15056: 0.0652881
15057: 0.0517077
15058: 0.0498955
15059: 0.0464090
Clusters.ThatColocWith_sQTL_not_hQTL <- coloc.dat %>%
group_by(Locus, snp) %>%
filter(any(PC=="Expression.Splicing.Subset_YRI") & !any(PC %in% c("H3K4ME3", "H3K27AC", "H3K4ME1"))) %>%
ungroup() %>%
filter(PC=="polyA.Splicing") %>%
separate(P, into=c("chrom", "start", "end", "cluster"), remove=F, convert=T, sep=":") %>%
distinct(cluster)
Clusters.AttemptColoc_butOther <- dat %>%
filter(PC1=="Expression.Splicing" & FDR.x < 0.1) %>%
filter(PC2=="polyA.Splicing" & p_permutation.y<0.01) %>%
distinct(P2) %>%
separate(P2, into=c("chrom", "start", "end", "cluster"), remove=F, convert=T, sep=":") %>%
distinct(cluster) %>%
filter(!cluster %in% Clusters.ThatColocWith_sQTL_not_hQTL$cluster)
dat.EnrichmentOfUnrpductiveClusters <- bind_rows(
Clusters.AttemptColoc_butOther %>%
mutate(Category="DoesNotColocWith_eGene"),
Clusters.ThatColocWith_sQTL_not_hQTL %>%
mutate(Category="DoesColocWith_eGene")
) %>%
left_join(
ClusterLevelQTLAnnotations %>%
distinct(cluster, ContainsUnproductive))
test.results <- dat.EnrichmentOfUnrpductiveClusters %>%
count(Category, ContainsUnproductive) %>%
pivot_wider(names_from="Category", values_from ="n") %>%
mutate(ContainsUnproductive=!ContainsUnproductive) %>%
column_to_rownames("ContainsUnproductive") %>%
as.matrix() %>%
fisher.test(alternative="less")
test.results
Fisher's Exact Test for Count Data
data: .
p-value = 1.982e-15
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.0000000 0.5214947
sample estimates:
odds ratio
0.4311536
P.EnrichmentOfUnrpductiveClusters <- dat.EnrichmentOfUnrpductiveClusters %>%
count(Category, ContainsUnproductive) %>%
ggplot(aes(x=Category, fill=ContainsUnproductive, y=n)) +
geom_col() +
scale_fill_manual(values=c("FALSE"="#377eb8", "TRUE"="#e41a1c"), labels=c("FALSE"="Productive sQTL", "TRUE"="Unproductive sQTL")) +
scale_x_discrete(labels=c(str_wrap("sQTLs that coloc with eQTL and not hQTL", 20), str_wrap("Other sQTLs tested\nfor colocalization", 20))) +
labs(caption=paste("One sided Fisher test P:", format.pval(test.results$p.value, 3), "\nOR:", round(1/test.results$estimate, 3)), y="Number of sQTLs", fill="sQTL type", x="") +
Rotate_x_labels
P.EnrichmentOfUnrpductiveClusters
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_SampleSize.pdf",P.SampleSizes, width=6, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_NumQTLs.pdf",NumQTLs, width=8, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_PercentAnnotated.pdf",P.PercentAnnotated, width=4, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_PercentOfUnannotated.pdf",P.PercentOfUnannotated, width=6, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_PercentAnnotatedLegend.pdf",P.PercentJuncsLegend, width=8, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_Multiple_hQTLs_PerGene.pdf",P.Number_hQTLsForColoc, width=6, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_Multiple_sQTLs_PerGene.pdf",P.Number_sQTLsForColoc, width=6, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_NonPrimaryMolQTLsColoc.pdf",P.NonPrimaryMolQTLs_OftenColoc, width=6, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_EnrichmentOfUnrpoductiveSqtls.pdf",P.EnrichmentOfUnrpductiveClusters, width=6, height=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_eGene_MolQTL_Colocs.pdf",P.eGeneColocsWith, width=10, height=7)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_eGene_MolQTL_NonhQTL_Colocs.pdf",P.eGeneColocsWith.nohQTL, width=10, height=7)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig_S_eGene_MolQTL_Simplified.pdf",P.SimplifiedNumberColocs, width=2.7, height=2.7)
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggbreak_0.1.1 knitr_1.39 ggrepel_0.9.1 qvalue_2.28.0
[5] readxl_1.4.0 data.table_1.14.2 RColorBrewer_1.1-3 forcats_0.5.1
[9] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[13] tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] nlme_3.1-157 fs_1.5.2 lubridate_1.8.0 bit64_4.0.5
[5] httr_1.4.3 rprojroot_2.0.3 tools_4.2.0 backports_1.4.1
[9] bslib_0.3.1 utf8_1.2.2 R6_2.5.1 mgcv_1.8-40
[13] DBI_1.1.2 colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[17] bit_4.0.4 compiler_4.2.0 git2r_0.30.1 cli_3.3.0
[21] rvest_1.0.2 xml2_1.3.3 labeling_0.4.2 sass_0.4.1
[25] scales_1.2.0 digest_0.6.29 yulab.utils_0.0.6 rmarkdown_2.14
[29] R.utils_2.11.0 pkgconfig_2.0.3 htmltools_0.5.2 highr_0.9
[33] dbplyr_2.1.1 fastmap_1.1.0 rlang_1.0.2 rstudioapi_0.13
[37] farver_2.1.0 gridGraphics_0.5-1 jquerylib_0.1.4 generics_0.1.2
[41] jsonlite_1.8.0 vroom_1.5.7 R.oo_1.24.0 magrittr_2.0.3
[45] ggplotify_0.1.0 Matrix_1.5-3 patchwork_1.1.1 Rcpp_1.0.8.3
[49] munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.1 R.methodsS3_1.8.1
[53] stringi_1.7.6 whisker_0.4 yaml_2.3.5 plyr_1.8.7
[57] grid_4.2.0 parallel_4.2.0 promises_1.2.0.1 crayon_1.5.1
[61] lattice_0.20-45 haven_2.5.0 splines_4.2.0 hms_1.1.1
[65] pillar_1.7.0 reshape2_1.4.4 reprex_2.0.1 glue_1.6.2
[69] evaluate_0.15 ggfun_0.0.9 modelr_0.1.8 vctrs_0.4.1
[73] tzdb_0.3.0 httpuv_1.6.5 cellranger_1.1.0 gtable_0.3.0
[77] assertthat_0.2.1 xfun_0.30 broom_0.8.0 later_1.3.0
[81] aplot_0.1.10 workflowr_1.7.0 ellipsis_0.3.2