Last updated: 2024-02-13
Checks: 6 1
Knit directory:
ChromatinSplicingQTLs/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20191126)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version c56b87d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: code/-
Ignored: code/.DS_Store
Ignored: code/.RData
Ignored: code/._report.html
Ignored: code/.ipynb_checkpoints/
Ignored: code/.snakemake/
Ignored: code/APA_Processing/
Ignored: code/Alignments/
Ignored: code/ChromHMM/
Ignored: code/ENCODE/
Ignored: code/ExpressionAnalysis/
Ignored: code/ExtractPhenotypeBedByGenotype.py
Ignored: code/FastqFastp/
Ignored: code/FastqFastpSE/
Ignored: code/FastqSE/
Ignored: code/FineMapping/
Ignored: code/GTEx/
Ignored: code/Gencode.v34.6Colors.bed.gz
Ignored: code/Genotypes/
Ignored: code/H3K36me3_CutAndTag.pdf
Ignored: code/IntronSlopes/
Ignored: code/LR.bed
Ignored: code/LR.seq.bed
Ignored: code/LongReads/
Ignored: code/MYB.tracks.ini
Ignored: code/Metaplots/
Ignored: code/Misc/
Ignored: code/MiscCountTables/
Ignored: code/Multiqc/
Ignored: code/Multiqc_chRNA/
Ignored: code/NonCodingRNA/
Ignored: code/NonCodingRNA_annotation/
Ignored: code/PairwisePi1Traits.P.all.txt.gz
Ignored: code/PeakCalling/
Ignored: code/Phenotypes/
Ignored: code/PlotGruberQTLs/
Ignored: code/PlotQTLs/
Ignored: code/ProCapAnalysis/
Ignored: code/QC/
Ignored: code/QTL_SNP_Enrichment/
Ignored: code/QTLs/
Ignored: code/RPKM_tables/
Ignored: code/ReadLengthMapExperiment/
Ignored: code/ReadLengthMapExperimentResults/
Ignored: code/ReadLengthMapExperimentSpliceCounts/
Ignored: code/ReferenceGenome/
Ignored: code/Rplots.pdf
Ignored: code/Session.vim
Ignored: code/SmallMolecule/
Ignored: code/SplicingAnalysis/
Ignored: code/TODO
Ignored: code/Tehranchi/
Ignored: code/alias/
Ignored: code/bigwigs/
Ignored: code/bigwigs_FromNonWASPFilteredReads/
Ignored: code/config/.DS_Store
Ignored: code/config/._.DS_Store
Ignored: code/config/.ipynb_checkpoints/
Ignored: code/config/config.local.yaml
Ignored: code/dag.pdf
Ignored: code/dag.png
Ignored: code/dag.svg
Ignored: code/data/
Ignored: code/debug.ipynb
Ignored: code/debug_python.ipynb
Ignored: code/deepTools/
Ignored: code/featureCounts/
Ignored: code/featureCountsBasicGtf/
Ignored: code/genome_config.yaml
Ignored: code/gwas_summary_stats/
Ignored: code/hyprcoloc/
Ignored: code/igv_session.xml
Ignored: code/isoseqbams/
Ignored: code/log
Ignored: code/logs/
Ignored: code/notebooks/.ipynb_checkpoints/
Ignored: code/pi1/
Ignored: code/polyA.Splicing.Subset_YRI.NominalPassForColoc.bed.bgz
Ignored: code/rules/.ipynb_checkpoints/
Ignored: code/rules/OldRules/
Ignored: code/rules/notebooks/
Ignored: code/salmontest/
Ignored: code/scratch/
Ignored: code/scripts/.ipynb_checkpoints/
Ignored: code/scripts/GTFtools_0.8.0/
Ignored: code/scripts/__pycache__/
Ignored: code/scripts/liftOverBedpe/liftOverBedpe.py
Ignored: code/snakemake.dryrun.log
Ignored: code/snakemake.log
Ignored: code/snakemake.sbatch.log
Ignored: code/snakemake_profiles/slurm/__pycache__/
Ignored: code/test.introns.bed
Ignored: code/test.introns2.bed
Ignored: code/test.log
Ignored: code/tracks.xml
Ignored: data/.DS_Store
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-10.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-11.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-2.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-3.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-4.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-5.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-6.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-7.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022-8.csv
Ignored: data/GWAS_catalog_summary_stats_sources/._list_gwas_summary_statistics_6_Apr_2022.csv
Ignored: data/Metaplots/.DS_Store
Untracked files:
Untracked: analysis/2024-01-17_CheckYangsJunctionAnnotator.Rmd
Unstaged changes:
Modified: analysis/MakeFinalFigs_Fig2.Rmd
Modified: analysis/MakeFinalFigs_Fig4.Rmd
Modified: code/scripts/GenometracksByGenotype
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/MakeFinalFigs_Fig4.Rmd
)
and HTML (docs/MakeFinalFigs_Fig4.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 42d9a7b | Benjmain Fair | 2023-09-14 | ben updates |
Rmd | ada72d2 | Benjmain Fair | 2023-07-24 | updates Ben |
Rmd | 3d0e9c1 | Benjmain Fair | 2023-06-21 | update intron annotations |
Rmd | 5946b5a | Benjmain Fair | 2023-05-22 | update site |
knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.7 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(RColorBrewer)
library(data.table)
Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':
between, first, last
The following object is masked from 'package:purrr':
transpose
library(edgeR)
Loading required package: limma
library(readxl)
library(qvalue)
library(ggrepel)
library(knitr)
library(drc)
Loading required package: MASS
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
'drc' has been loaded.
Please cite R and 'drc' if used for a publication,
for references type 'citation()' and 'citation('drc')'.
Attaching package: 'drc'
The following objects are masked from 'package:stats':
gaussian, getInitial
library(ggbreak)
ggbreak v0.1.1
If you use ggbreak in published research, please cite the following
paper:
S Xu, M Chen, T Feng, L Zhan, L Zhou, G Yu. Use ggbreak to effectively
utilize plotting space to deal with large datasets and outliers.
Frontiers in Genetics. 2021, 12:774846. doi: 10.3389/fgene.2021.774846
# Set theme
theme_set(
theme_classic() +
theme(text=element_text(size=16, family="Helvetica")))
# I use layer a lot, to rotate long x-axis labels
Rotate_x_labels <- theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
IntronAnnotatins <- read_tsv("../data/IntronAnnotationsFromYang.tsv.gz") %>%
mutate(chrom = str_remove_all(chrom, "chr")) %>%
mutate(Intron = paste(chrom, start, end, sep=":")) %>%
filter(!str_detect(SuperAnnotation, "NoncodingGene"))
small.molecule.dat.genes <- read_tsv("../code/SmallMolecule/FitModels/polyA_genes.tsv.gz")
# Model fits using PSI using intron trios
small.molecule.dat.introns <- read_tsv("../code/SmallMolecule/FitModels/polyA_GAGTIntrons_asPSI.tsv.gz") %>%
mutate(Intron = str_replace(junc, "^chr(.+?):[+-]$", "\\1")) %>%
left_join(IntronAnnotatins)
ClusterSignificance <- Sys.glob("../code/SmallMolecule/leafcutter/ds/chRNA_risdiplam_*_cluster_significance.txt") %>%
setNames(str_replace(., "../code/SmallMolecule/leafcutter/ds/chRNA_risdiplam_(.+?)_cluster_significance.txt", "\\1")) %>%
lapply(read_tsv) %>%
bind_rows(.id="dose.nM")
EffectSizes <- Sys.glob("../code/SmallMolecule/leafcutter/ds/chRNA_risdiplam_*_effect_sizes.txt") %>%
setNames(str_replace(., "../code/SmallMolecule/leafcutter/ds/chRNA_risdiplam_(.+?)_effect_sizes.txt", "\\1")) %>%
lapply(read_tsv) %>%
bind_rows(.id="dose.nM") %>%
mutate(cluster = str_replace(intron, "^(.+?):.+?:.+?(:clu_.+$)", "\\1\\2"))
chRNA.splicing.dat <- inner_join(EffectSizes, ClusterSignificance)
Translated.GAGT_CassetteExons <- read_tsv("../code/SmallMolecule/CassetteExons/ExonsToTranslate.Translated.tsv.gz") %>%
dplyr::select(-chrom, -Strand) %>%
mutate(TranslationDiff = str_length(IncludedTranslation) - str_length(SkippedTranslation)) %>%
mutate(InFrame = str_length(IncludedTranslation)-str_length(SkippedTranslation)==(ExonStop_Cassette-ExonStart_Cassette)/3) %>%
mutate(ExonRemainder = (ExonStop_Cassette - ExonStart_Cassette)%%3) %>%
mutate(Effect = case_when(
is.na(InFrame) ~ "Cassette in UTR",
InFrame ~ "Preserves frame",
ExonRemainder %in% c(1,2) ~ "Frame shifting",
InFrame == F ~ "In-frame PTC",
)) %>%
left_join(
IntronAnnotatins %>%
unite(Intron, chrom, start, end, strand, sep=":") %>%
mutate(GAGTInt = paste0("chr", Intron))
)
leafcutter.counts <- read.table("../code/SmallMolecule/leafcutter/clustering/autosomes/leafcutter_perind_numers.counts.gz", header=T, sep=' ') %>% as.matrix()
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
ensembl_to_symbols <- getBM(attributes= c("ensembl_gene_id","hgnc_symbol"),mart=ensembl)
Expression.tidyForDoseResponse <- read_tsv("/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/DoseResponseData/LCL/TidyExpressionDoseData.txt.gz") %>%
mutate(gene = str_replace(Geneid, "(^.+?)\\..+?$", "\\1")) %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id"))
chRNA.DE.dat <- read_tsv("../code/SmallMolecule/chRNA/DE.results.tsv.gz")
First I need due testing to get pvalue from dose response fits in polyA data at the low and high dose
polyA.GeneEffects <- small.molecule.dat.genes %>%
filter(str_detect(param, "Pred")) %>%
mutate(gene = str_replace(Geneid, "(^.+?)\\..+?$", "\\1")) %>%
separate(param, into=c("Dummy", "dose.nM"), convert=T, sep="_") %>%
dplyr::select(-Dummy)
GeneEffects.Changes <-
inner_join(
polyA.GeneEffects %>%
filter(dose.nM == 0) %>%
dplyr::select(-Geneid, -dose.nM),
polyA.GeneEffects %>%
filter(!dose.nM == 0),
by=c("gene"),
suffix=c(".expression.untreated", ".expression.treated")
) %>%
distinct() %>%
mutate(Estimate.Expression.FC = Estimate.expression.treated - Estimate.expression.untreated) %>%
mutate(z = Estimate.Expression.FC/sqrt((SE.expression.treated + SE.expression.untreated))) %>%
mutate(abs.z = abs(z)) %>%
# filter(dose.nM == 100) %>%
mutate(polyA.P = 2*pnorm(abs.z, lower.tail = F)) %>%
group_by(dose.nM) %>%
add_count(gene) %>%
filter(n==1) %>%
mutate(FDR = qvalue(polyA.P)$qvalues) %>%
ungroup() %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id"))
GeneEffects.Changes %>%
ggplot(aes(x=polyA.P)) +
geom_histogram() +
facet_wrap(~dose.nM)
GeneEffects.Changes %>%
filter(dose.nM==100 & FDR<0.1) %>%
arrange(polyA.P)
# A tibble: 3 × 14
Estimate.expression.un… SE.expression.u… gene Geneid dose.nM Estimate.expres…
<dbl> <dbl> <chr> <chr> <int> <dbl>
1 3.05 0.0621 ENSG… ENSG0… 100 0.458
2 7.25 0.0495 ENSG… ENSG0… 100 5.44
3 2.26 0.211 ENSG… ENSG0… 100 4.75
# … with 8 more variables: SE.expression.treated <dbl>,
# Estimate.Expression.FC <dbl>, z <dbl>, abs.z <dbl>, polyA.P <dbl>, n <int>,
# FDR <dbl>, hgnc_symbol <chr>
SigLowDoseGenes <- GeneEffects.Changes %>%
filter(dose.nM==100 & FDR<0.1) %>%
arrange(polyA.P) %>% pull(gene)
Expression.tidyForDoseResponse %>%
filter(gene %in% SigLowDoseGenes | hgnc_symbol %in% c("HTT", "SMN2", "GALC", "STAT1", "KCNT2", "FHOD3", "SRSF10")) %>%
ggplot(aes(x=dose.nM, y=CPM, color=treatment)) +
geom_line() +
scale_y_continuous(trans="log2") +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 1, 0.316, 0)) +
facet_wrap(~hgnc_symbol, scales="free_y") +
Rotate_x_labels
Ok, the test actually seems sort of reasonably calibrated at the high dose, but is overly conservative at the low dose. This issue might stem from the fact that the exact model i fit too (the log-logistic), might not be the best choice of sigmoidal model, especially around the elbows at low doses where an effect is just starting to be measurable. But fixing that could take a bit of work. There may be other workarounds as well to get a better calibrated test. Intuitively I feel what I’m doing (fitting model, then using model to output prediction with standard error at dose=100nM and dose=0nM) isn’t really using all the data very efficiently. Well I guess I don’t really mind being overly conservative, and given how the message of the paper doesn’t at all revolve around this point. So rather than figure out how to better calibrate this test, I think I’ll just be content with this methodology. Especially since the few significant hits actually do seem quite believable when I look at the full dose response set. And FOS is actually in interesting gene. I’m surprised I have missed that thus far.
Ok, so now I will make the final looking volcano plots
volcano.dat <- bind_rows(
chRNA.DE.dat %>%
separate(Geneid, into=c("Geneid", "symbol"), sep="\\_") %>%
mutate(gene = str_replace(Geneid, "(^.+?)\\..+?$", "\\1")) %>%
dplyr::select(logFC, PValue, FDR, gene, symbol, dose.nM=risdiplam_conc) %>%
mutate(dataset = "chRNA"),
GeneEffects.Changes %>%
dplyr::select(logFC=Estimate.Expression.FC, PValue=polyA.P, gene, dose.nM, symbol=hgnc_symbol, FDR, Estimate.expression.untreated) %>%
mutate(dataset = "polyA")
)
volcano.dat %>% filter(dose.nM == 100 & FDR<0.1) %>%
arrange(dataset, FDR)
# A tibble: 11 × 8
logFC PValue FDR gene symbol dose.nM dataset Estimate.expres…
<dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr> <dbl>
1 2.22 7.19e-10 0.00000806 ENSG00… TEX21P 100 chRNA NA
2 1.72 1.01e- 9 0.00000806 ENSG00… CAND2 100 chRNA NA
3 0.998 1.95e- 7 0.00104 ENSG00… ANP32A 100 chRNA NA
4 0.776 1.80e- 6 0.00647 ENSG00… GALC 100 chRNA NA
5 1.03 2.02e- 6 0.00647 ENSG00… POLN 100 chRNA NA
6 1.02 8.20e- 6 0.0218 ENSG00… C19or… 100 chRNA NA
7 0.865 1.92e- 5 0.0438 ENSG00… WDR91 100 chRNA NA
8 -0.448 4.82e- 5 0.0962 ENSG00… MZB1 100 chRNA NA
9 -2.60 2.33e-13 0.00000000232 ENSG00… TTC12 100 polyA 3.05
10 -1.82 1.17e- 6 0.00583 ENSG00… PDXDC1 100 polyA 7.25
11 2.48 2.90e- 5 0.0963 ENSG00… FOS 100 polyA 2.26
volcano.dat %>%
filter(!is.na(FDR)) %>%
ggplot( aes(x=logFC, y=-log10(PValue), color=FDR<0.1)) +
geom_point(alpha=0.1) +
facet_wrap(dataset~dose.nM, scales = "free_y")
volcano.dat %>%
filter(symbol %in% c("HTT", "SMN2", "GALC", "STAT1", "KCNT2", "FHOD3")) %>%
arrange(dataset, dose.nM)
# A tibble: 18 × 8
logFC PValue FDR gene symbol dose.nM dataset Estimate.expres…
<dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr> <dbl>
1 0.776 1.80e- 6 6.47e- 3 ENSG00000… GALC 100 chRNA NA
2 0.410 2.10e- 3 6.06e- 1 ENSG00000… FHOD3 100 chRNA NA
3 0.0236 8.14e- 1 9.23e- 1 ENSG00000… STAT1 100 chRNA NA
4 -0.0218 8.56e- 1 9.41e- 1 ENSG00000… SMN2 100 chRNA NA
5 0.998 1.62e- 7 1.52e- 5 ENSG00000… GALC 3160 chRNA NA
6 0.493 5.22e- 4 5.93e- 3 ENSG00000… STAT1 3160 chRNA NA
7 0.375 3.50e- 3 2.59e- 2 ENSG00000… FHOD3 3160 chRNA NA
8 0.0800 5.08e- 1 8.74e- 1 ENSG00000… SMN2 3160 chRNA NA
9 -0.266 4.08e- 1 1 e+ 0 ENSG00000… STAT1 100 polyA 10.9
10 -0.204 7.90e- 1 1 e+ 0 ENSG00000… HTT 100 polyA 8.31
11 -0.00590 9.66e- 1 1 e+ 0 ENSG00000… SMN2 100 polyA 5.99
12 -1.94 4.02e- 3 1 e+ 0 ENSG00000… GALC 100 polyA 3.38
13 -1.06 1.64e- 3 1 e+ 0 ENSG00000… FHOD3 100 polyA 3.65
14 -4.36 1.42e-36 2.87e-34 ENSG00000… STAT1 3160 polyA 10.9
15 -1.84 1.78e- 3 9.96e- 3 ENSG00000… HTT 3160 polyA 8.31
16 -0.294 9.95e- 2 1.90e- 1 ENSG00000… SMN2 3160 polyA 5.99
17 -2.26 7.45e- 5 6.72e- 4 ENSG00000… GALC 3160 polyA 3.38
18 -2.88 2.07e-18 1.36e-16 ENSG00000… FHOD3 3160 polyA 3.65
volcano.dat %>%
filter(!is.na(FDR)) %>%
mutate(Effect = if_else(FDR<0.1, sign(logFC), NaN )) %>%
count(Effect, dose.nM, dataset) %>%
arrange(dataset, dose.nM, Effect)
# A tibble: 12 × 4
Effect dose.nM dataset n
<dbl> <dbl> <chr> <int>
1 -1 100 chRNA 1
2 1 100 chRNA 7
3 NaN 100 chRNA 15970
4 -1 3160 chRNA 2025
5 1 3160 chRNA 1165
6 NaN 3160 chRNA 12788
7 -1 100 polyA 2
8 1 100 polyA 1
9 NaN 100 polyA 9964
10 -1 3160 polyA 1399
11 1 3160 polyA 780
12 NaN 3160 polyA 7769
volcano.dat %>%
filter(!is.na(FDR)) %>%
ggplot( aes(x=logFC, y=-log10(PValue), color=FDR<0.1)) +
geom_point(alpha=0.5) +
facet_wrap(dataset~dose.nM, scales = "free_y")
volcano.dat %>%
filter(!is.na(FDR)) %>%
ggplot(aes(x=logFC)) +
stat_ecdf() +
geom_vline(xintercept = 0) +
coord_cartesian(xlim=c(-2,2)) +
facet_wrap(dataset~dose.nM, scales = "free_y")
volcano.dat %>%
filter(!is.na(FDR)) %>%
ggplot( aes(x=logFC, y=-log10(PValue), color=FDR<0.1)) +
geom_point(alpha=0.5) +
facet_wrap(dataset~dose.nM, scales = "free_y")
P.MA.count.labels <- volcano.dat %>%
filter(!is.na(FDR)) %>%
filter(dataset == "polyA") %>%
mutate(dose.nM = recode(dose.nM, `100`="100nM risdiplam", `3160`="3160nM risdiplam")) %>%
filter(FDR<0.01) %>%
mutate(sign = sign(logFC)) %>%
mutate(UpOrDown = recode(sign, `1`="Up", `-1`="Down")) %>%
count(dose.nM, sign, UpOrDown, .drop=F) %>%
mutate(y = Inf * sign) %>%
mutate(label = str_glue("{n} {UpOrDown}"))
P.MA <- volcano.dat %>%
filter(!is.na(FDR)) %>%
filter(dataset == "polyA") %>%
mutate(dose.nM = recode(dose.nM, `100`="100nM risdiplam", `3160`="3160nM risdiplam")) %>%
ggplot( aes(x=Estimate.expression.untreated, y=logFC, color=FDR<0.01)) +
geom_point(alpha=0.5) +
geom_hline(yintercept=0, linetype='dashed') +
scale_color_manual(values=c(`TRUE`="red", `FALSE`="gray")) +
geom_text(data=P.MA.count.labels, aes(y=y, label=label, vjust=sign), x=Inf, color="red", hjust=1) +
facet_wrap(~dose.nM, scales = "free_y") +
labs(x="Expression in DMSO control (log2CPM)", y="log2FoldChange,\nestimated from\ndose-response curve fit")
P.MA
#First see how my annotations match up with Gencode and Yang's annotations
Translated.GAGT_CassetteExons %>%
mutate(IsAnnotated = if_else(str_detect(SuperAnnotation,"Annotated"), "Annotated by Gencode", "Annotated by Yang")) %>%
replace_na(list(SemiSupergroupAnnotations="Unannotated by either", IsAnnotated="Unannotated by either")) %>%
count(Effect, IsAnnotated, SemiSupergroupAnnotations) %>%
ggplot(aes(x=Effect, y=n, fill=SemiSupergroupAnnotations)) +
geom_col(position="stack") +
facet_wrap(~IsAnnotated, scales="free_y", nrow = 3) +
Rotate_x_labels +
labs(title="Comparing my annotations to Gencode/Yang", x="My annotation by comparing translations", fill="Gencode/Yang's annotation")
Ok, I feel good enough about my annotations which I think were slightly more careful than Yang’s but nonetheless mostly yield the same functional result - that the vast majority of unannotated junctions will be non-productive.
Now let’s plot something that I would want to include in manuscript
Translated.GAGT_CassetteExons$Effect %>% unique()
[1] "Frame shifting" "In-frame PTC" "Cassette in UTR" "Preserves frame"
P.bar.dat <- Translated.GAGT_CassetteExons %>%
mutate(IsAnnotated = if_else(str_detect(SuperAnnotation,"Annotated"), "Annotated by Gencode", "Unannotated", missing="Unannotated")) %>%
mutate(MergedEffect = case_when(
IsAnnotated == "Annotated by Gencode" & SemiSupergroupAnnotations=="basic tag" ~ "Productive",
IsAnnotated == "Annotated by Gencode" & !SemiSupergroupAnnotations=="basic tag" ~ "NMD",
TRUE ~ Effect
)) %>%
mutate(ColorEffect = if_else(MergedEffect %in% c("Productive", "Preserves frame", "Cassette in UTR"), "Productive", "NMD") )
#Check MYB
P.bar.dat %>%
filter(str_detect(gene, "ENSG00000118513"))
# A tibble: 1 × 24
GAGTInt ExonStart_Upstr… ExonStop_Upstre… ExonStart_Casse… ExonStop_Casset…
<chr> <dbl> <dbl> <dbl> <dbl>
1 chr6:1351… 135198907 135199050 135199525 135199581
# … with 19 more variables: ExonStart_Downstream <dbl>,
# ExonStop_Downstream <dbl>, gene <chr>, transcript <chr>,
# SkipJuncName <chr>, IncludedTranslation <chr>, SkippedTranslation <chr>,
# TranslationDiff <int>, InFrame <lgl>, ExonRemainder <dbl>, Effect <chr>,
# Intron <chr>, NewAnnotation <chr>, symbol <chr>, SuperAnnotation <chr>,
# SemiSupergroupAnnotations <chr>, IsAnnotated <chr>, MergedEffect <chr>,
# ColorEffect <chr>
P.bar <- P.bar.dat %>%
count(MergedEffect, ColorEffect, IsAnnotated) %>%
mutate(MergedEffect=factor(MergedEffect, levels=c("Productive", "Cassette in UTR","Preserves frame", "NMD", "In-frame PTC", "Frame shifting"))) %>%
mutate(IsAnnotated = recode(IsAnnotated, "Annotated by Gencode"="Annotated")) %>%
ggplot(aes(y=IsAnnotated, x=n, fill=MergedEffect, color=ColorEffect)) +
geom_col(position="stack", size=3) +
geom_text(aes(label=n), color='black', position=position_stack(vjust=0.5)) +
facet_wrap(~IsAnnotated, nrow=2, scales = "free") +
scale_color_manual(values=c(
"Productive" = "#1f78b4",
"NMD" = "#e31a1c"
)) +
scale_fill_manual(values=c(
"Productive"="#1f78b4",
"NMD"="#e31a1c",
"Frame shifting"="#fb6a4a",
"Cassette in UTR"="#deebf7",
"In-frame PTC"="#fcbba1",
"Preserves frame"="#a6cee3"
),
labels=c(
"Productive"="In annotated productive transcript",
"NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame"
)) +
scale_x_continuous(expand=c(0,0)) +
Rotate_x_labels +
guides(fill = guide_legend(order = 2),
color = guide_legend(order = 1)) +
theme(legend.box="vertical", legend.position = "bottom") +
theme(
strip.background = element_blank(),
strip.text.x = element_blank()
) +
labs(title="Predicted transcript effects of induced cassete exons", x="Number risdiplam-induced cassette exons", fill="Reason", y=NULL, color="Transcript effect")
P.bar
P.bar.dat %>%
filter(str_detect(gene, "ENSG00000188529"))
# A tibble: 0 × 24
# … with 24 variables: GAGTInt <chr>, ExonStart_Upstream <dbl>,
# ExonStop_Upstream <dbl>, ExonStart_Cassette <dbl>, ExonStop_Cassette <dbl>,
# ExonStart_Downstream <dbl>, ExonStop_Downstream <dbl>, gene <chr>,
# transcript <chr>, SkipJuncName <chr>, IncludedTranslation <chr>,
# SkippedTranslation <chr>, TranslationDiff <int>, InFrame <lgl>,
# ExonRemainder <dbl>, Effect <chr>, Intron <chr>, NewAnnotation <chr>,
# symbol <chr>, SuperAnnotation <chr>, SemiSupergroupAnnotations <chr>, …
I want to make one more version of the bar plot because it might more compact to manually annotate labels over the sub-bars instead of using fill colors
P.bar.simpler <- P.bar.dat %>%
count(MergedEffect, ColorEffect, IsAnnotated) %>%
mutate(MergedEffect=factor(MergedEffect, levels=c("Productive", "Cassette in UTR","Preserves frame", "NMD", "In-frame PTC", "Frame shifting"))) %>%
mutate(IsAnnotated = recode(IsAnnotated, "Annotated by Gencode"="Annotated")) %>%
ggplot(aes(y=IsAnnotated, x=n, fill=MergedEffect)) +
geom_col(position="stack", color='black') +
# geom_text(aes(label=str_glue("{MergedEffect}; n={n}")), color='black', position=position_stack(vjust=0.5)) +
facet_wrap(~IsAnnotated, nrow=2, scales = "free") +
scale_fill_manual(values=c(
"Productive"="#1f78b4",
"NMD"="#e31a1c",
"Frame shifting"="#fb9a99",
"Cassette in UTR"="#a6cee3",
"In-frame PTC"="#fb9a99",
"Preserves frame"="#a6cee3"
),
labels=c(
"Productive"="In annotated productive transcript",
"NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame"
)) +
scale_x_continuous(expand=c(0,0)) +
Rotate_x_labels +
guides(fill = guide_legend(order = 2),
color = guide_legend(order = 1)) +
theme(legend.position = "none") +
theme(
strip.background = element_blank(),
strip.text.x = element_blank()
) +
labs(title="Predicted transcript effects of induced cassete exons", x="Number risdiplam-induced cassette exons", fill="Reason", y=NULL, color="Transcript effect")
P.bar.simpler
One more version… with fixed scale… borrowed code from here
position_stack_and_nudge <- function(x = 0, y = 0, vjust = 1, reverse = FALSE) {
ggproto(NULL, PositionStackAndNudge,
x = x,
y = y,
vjust = vjust,
reverse = reverse
)
}
#' @rdname ggplot2-ggproto
#' @format NULL
#' @usage NULL
#' @noRd
PositionStackAndNudge <- ggproto("PositionStackAndNudge", PositionStack,
x = 0,
y = 0,
setup_params = function(self, data) {
c(
list(x = self$x, y = self$y),
ggproto_parent(PositionStack, self)$setup_params(data)
)
},
compute_layer = function(self, data, params, panel) {
# operate on the stacked positions (updated in August 2020)
data = ggproto_parent(PositionStack, self)$compute_layer(data, params, panel)
x_orig <- data$x
y_orig <- data$y
# transform only the dimensions for which non-zero nudging is requested
if (any(params$x != 0)) {
if (any(params$y != 0)) {
data <- transform_position(data, function(x) x + params$x, function(y) y + params$y)
} else {
data <- transform_position(data, function(x) x + params$x, NULL)
}
} else if (any(params$y != 0)) {
data <- transform_position(data, function(x) x, function(y) y + params$y)
}
data$nudge_x <- data$x
data$nudge_y <- data$y
data$x <- x_orig
data$y <- y_orig
data
},
compute_panel = function(self, data, params, scales) {
ggproto_parent(PositionStack, self)$compute_panel(data, params, scales)
}
)
P.bar.simpler.fixed <- P.bar.dat %>%
count(MergedEffect, ColorEffect, IsAnnotated) %>%
mutate(MergedEffect=factor(MergedEffect, levels=c("Productive", "Cassette in UTR","Preserves frame", "NMD", "In-frame PTC", "Frame shifting"))) %>%
mutate(IsAnnotated = recode(IsAnnotated, "Annotated by Gencode"="Annotated")) %>%
ggplot(aes(y=IsAnnotated, x=n, fill=MergedEffect)) +
geom_col(position="stack", color='black') +
# geom_text(aes(label=n), color='black', position=position_stack(vjust=0.5)) +
# geom_text_repel(aes(label=MergedEffect), color='black', position = position_stack_and_nudge(vjust = 0.5, y = 0.2),
# direction = 'x') +
scale_fill_manual(values=c(
"Productive"="#1f78b4",
"NMD"="#e31a1c",
"Frame shifting"="#fb9a99",
"Cassette in UTR"="#a6cee3",
"In-frame PTC"="#fb9a99",
"Preserves frame"="#a6cee3"
),
labels=c(
"Productive"="In annotated productive transcript",
"NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame"
)) +
scale_x_continuous(expand=c(0,0)) +
Rotate_x_labels +
guides(fill = guide_legend(order = 2),
color = guide_legend(order = 1)) +
theme(legend.position = "none") +
theme(
strip.background = element_blank(),
strip.text.x = element_blank()
) +
labs(title="Predicted transcript effects of induced cassete exons", x="Number risdiplam-induced cassette exons", fill="Reason", y=NULL, color="Transcript effect") +
facet_wrap(~IsAnnotated, nrow=2, scales = "free_y")
P.bar.simpler.fixed
P.boxplot.dat <- P.bar.dat %>%
dplyr::select(-symbol) %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1")) %>%
left_join(
volcano.dat %>%
filter(dose.nM == 3160) %>%
distinct(gene, dose.nM, dataset, .keep_all=T) %>%
dplyr::select(gene, dose.nM, dataset, logFC) %>%
pivot_wider(names_from="dataset", values_from="logFC"),
by="gene"
) %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id")) %>%
# filter(hgnc_symbol %in% c("HTT", "SMN2", "GALC", "STAT1", "KCNT2", "FHOD3", "FOXM1))
dplyr::select(GAGTInt, ExonStart_Cassette, ExonStop_Cassette, gene, hgnc_symbol, chRNA, polyA, ColorEffect, MergedEffect, IsAnnotated) %>%
gather("dataset", "log2FC", chRNA, polyA)
P.box.small <- P.boxplot.dat %>%
mutate(ColorEffect = factor(ColorEffect, levels=c("Productive", "NMD"))) %>%
ggplot(aes(y=log2FC, x=ColorEffect, color=ColorEffect)) +
geom_hline(yintercept=0) +
geom_boxplot(outlier.shape=NA, alpha=1, fill="white") +
geom_segment(x="Productive", xend="NMD", y=2.5, yend=2.5, color='black') +
geom_point(position=position_jitterdodge(jitter.width=0.5), alpha=0.3) +
geom_text(data = . %>%
group_by(dataset) %>%
do(w = wilcox.test(log2FC~ColorEffect, data=., paired=FALSE)) %>%
summarise(dataset, Wilcox = format.pval(w$p.value, digits=3)) %>%
ungroup() %>%
mutate(label=str_glue("P={Wilcox}")),
aes(label=label, group=NULL),
x=1.5, y=2.5, color='black', vjust=-1) +
scale_color_manual(values=c( "#1f78b4","#e31a1c")) +
facet_wrap(~dataset) +
labs(y=expression("Host gene "*log["2"]*"FC"), x="Predicted transcript effect") +
Rotate_x_labels +
theme(legend.position="none")
P.box.small
P.box <- P.boxplot.dat %>%
mutate(ColorEffect = factor(ColorEffect, levels=c("Productive", "NMD"))) %>%
ggplot(aes(y=log2FC, x=ColorEffect, color=interaction(IsAnnotated, ColorEffect), group=interaction(IsAnnotated, ColorEffect))) +
geom_hline(yintercept=0) +
geom_boxplot(outlier.shape=NA, alpha=1, fill="white") +
geom_segment(x="Productive", xend="NMD", y=2.5, yend=2.5, color='black') +
geom_point(position=position_jitterdodge(jitter.width=0.5), alpha=0.3) +
geom_text(data = . %>%
group_by(dataset) %>%
do(w = wilcox.test(log2FC~ColorEffect, data=., paired=FALSE)) %>%
summarise(dataset, Wilcox = format.pval(w$p.value, digits=3)) %>%
ungroup() %>%
mutate(label=str_glue("P={Wilcox}")),
aes(label=label, group=NULL),
x=1.5, y=2.5, color='black', vjust=-1) +
scale_color_manual(values=c( "#1f78b4", "#a6cee3", "#e31a1c", "#fb9a99")) +
facet_wrap(~dataset) +
labs(y=expression("Host gene "*log["2"]*"FC"), x="Predicted transcript effect") +
Rotate_x_labels +
theme(legend.position="none")
P.box
Let’s quickly check some genes of interest in this set of NMD-predicted cassette exon targets
P.boxplot.dat %>%
filter(ColorEffect == "NMD") %>%
filter(hgnc_symbol %in% c("HTT", "SMN2", "GALC", "STAT1", "KCNT2", "FHOD3", "FOXM1", "FOS", "PIEZO2", "MYB", "SRSF10"))
# A tibble: 12 × 10
GAGTInt ExonStart_Casse… ExonStop_Casset… gene hgnc_symbol ColorEffect
<chr> <dbl> <dbl> <chr> <chr> <chr>
1 chr12:285966… 2861378 2861412 ENSG… FOXM1 NMD
2 chr18:107021… 10702545 10702669 ENSG… PIEZO2 NMD
3 chr18:109112… 10963754 10963891 ENSG… PIEZO2 NMD
4 chr2:1909770… 190978527 190978606 ENSG… STAT1 NMD
5 chr4:3213736… 3213621 3213736 ENSG… HTT NMD
6 chr6:1351995… 135199525 135199581 ENSG… MYB NMD
7 chr12:285966… 2861378 2861412 ENSG… FOXM1 NMD
8 chr18:107021… 10702545 10702669 ENSG… PIEZO2 NMD
9 chr18:109112… 10963754 10963891 ENSG… PIEZO2 NMD
10 chr2:1909770… 190978527 190978606 ENSG… STAT1 NMD
11 chr4:3213736… 3213621 3213736 ENSG… HTT NMD
12 chr6:1351995… 135199525 135199581 ENSG… MYB NMD
# … with 4 more variables: MergedEffect <chr>, IsAnnotated <chr>,
# dataset <chr>, log2FC <dbl>
P.boxplot.dat %>%
filter(ColorEffect == "NMD") %>%
pull(hgnc_symbol) %>% unique()
[1] "RERE" "PIK3CD" "SPEN" "FUCA1" "TOE1"
[6] "NSUN4" "IL12RB2" "COL24A1" "DIPK1A" "POGZ"
[11] "VAMP4" "COP1" "SOAT1" "DHX9" "NIBAN1"
[16] "PLA2G4A" "DENND1B" "MDM4" "INTS7" "MLLT10"
[21] "ARHGAP12" "MARCHF8" "VDAC2" "MMS19" "DNMBP"
[26] "NT5C2" "PDCD4" "CASP7" "WDR11" "DENND5A"
[31] "BTBD10" "CSTF3" "APIP" "MARK2" "UVRAG"
[36] "EMSY" "ALG8" "HIKESHI" "CEP57" "CASP4"
[41] "SIK3" "ERC1" "FOXM1" "PLEKHG6" "KIF21A"
[46] "GXYLT1" "PPHLN1" "TMBIM6" "GTSF1" "DGKA"
[51] "PLXNC1" "PPP1CC" "HECTD4" "RBM19" "MED13L"
[56] "SPPL3" "ZCCHC8" "DDX51" "POLR1D" "SUPT20H"
[61] "COG3" "FNDC3A" "SPRYD7" "TDRD3" "MZT1"
[66] "MYCBP2" "BIVM-ERCC5" "ERCC5" "RAB2B" "SOS2"
[71] "CDKL1" "PPP2R5E" "COQ6" "ITPK1" "NIPA1"
[76] "SNAP23" "TCF12" "CSNK1G1" "" "CLPX"
[81] "SCAPER" "PDE8A" "PDXDC1" "ABCC1" "TNRC6A"
[86] "CYLD" "OGFOD1" "SLC7A6" "SF3B3" "PIEZO1"
[91] "SCO1" "BLTP2" "NSRP1" "UTP6" "MYO1D"
[96] "HDAC5" "LUC7L3" "TRIM25" "DDX42" "GRB2"
[101] "PIEZO2" "KDSR" "AKAP8L" "DDX49" "FBL"
[106] "SNRPD2" "BICRA" "ADAM17" "LDAH" "ATAD2B"
[111] "MRPL33" "BABAM2" "MEMO1" "FAM161A" "GCFC2"
[116] "KCMF1" "C2orf49" "DARS1" "MARCHF7" "TANK"
[121] "DCAF17" "STAT1" "ANKRD44" "TYW5" "STRADB"
[126] "IDH1" "COPS7B" "TASP1" "XRN2" "PIGU"
[131] "DNTTIP1" "SLC9A8" "PTPN1" "IFNGR2" "TNRC6B"
[136] "SLC25A17" "EP300" "NUP50" "TAMM41" "GOLGA4"
[141] "LARS2" "PXK" "CBLB" "CD47" "CIP2A"
[146] "TMEM39A" "SLC41A3" "DNAJC13" "RNF13" "TFRC"
[151] "NELFA" "HTT" "ARAP2" "SEC31A" "NFKB1"
[156] "IRF2" "EXOC3" "LPCAT1" "CCT5" "SKP2"
[161] "OXCT1" "ERCC8" "PIK3R1" "GTF2H2" "ANKRA2"
[166] "TBCA" "TENT2" "DHFR" "MCTP1" "FAM174A"
[171] "FER" "HSD17B4" "SNX24" "DIAPH1" "JARID2"
[176] "C6orf62" "KIFC1" "CUL7" "EFHC1" "RARS2"
[181] "MDN1" "USP45" "CRYBG1" "RPF2" "REV3L"
[186] "MYB" "WDR27" "HYCC1" "IGF2BP3" "AVL9"
[191] "NT5C3A" "ELMO1" "PSMA2" "ZNF138" "CPSF4"
[196] "MDFIC" "SSBP1" "MCPH1" "PCM1" "CHRNA6"
[201] "PRKDC" "PDE7A" "MTERF3" "UBR5" "NUDCD1"
[206] "TAF2" "FOCAD" "TJP2" "GNG10" "FBXW2"
[211] "DENND1A" "PBX3" "DPM2" "FNBP1" "RALGDS"
[216] "EHMT1"
Threshold <- log2(1.5)
Scatter.dat <- volcano.dat %>%
filter(dose.nM == 3160) %>%
distinct(gene, dose.nM, dataset, .keep_all=T) %>%
dplyr::select(gene, dose.nM, dataset, logFC, PValue, FDR) %>%
pivot_wider(names_from="dataset", values_from=c("logFC", "PValue", "FDR")) %>%
mutate(Category = case_when(
# FDR_polyA < 0.1 & !(FDR_chRNA < 0.1 & sign(logFC_chRNA)==sign(logFC_polyA)) ~ "polyA-specific change",
FDR_polyA <0.1 & FDR_chRNA <0.1 & logFC_polyA-logFC_chRNA<Threshold & logFC_polyA-logFC_chRNA > -Threshold ~ "txn",
FDR_polyA < 0.1 & logFC_polyA-logFC_chRNA>Threshold ~ "post-txn up",
FDR_polyA < 0.1 & logFC_polyA-logFC_chRNA < -Threshold ~ "post-txn down",
TRUE ~ "Other"
))
Scatter.dat %>%
filter(!is.na(logFC_polyA) & !is.na(logFC_chRNA)) %>%
count(Category)
# A tibble: 4 × 2
Category n
<chr> <int>
1 Other 6928
2 post-txn down 628
3 post-txn up 340
4 txn 544
Enrichment.test <- Scatter.dat %>%
filter(!is.na(logFC_polyA) & !is.na(logFC_chRNA)) %>%
left_join(
P.boxplot.dat %>%
dplyr::select(gene, ColorEffect) %>%
filter(ColorEffect == "NMD"),
by="gene"
) %>%
mutate(IsPostTxn = Category == "post-txn down") %>%
mutate(IsNMD = ColorEffect == "NMD") %>%
replace_na(list(IsNMD=F)) %>%
count(IsPostTxn, IsNMD) %>%
pivot_wider(names_from="IsPostTxn", values_from="n") %>%
column_to_rownames("IsNMD") %>%
fisher.test()
library(ggnewscale)
P.scatter <- Scatter.dat %>%
ggplot(aes(x=logFC_chRNA, y=logFC_polyA)) +
geom_abline(color='gray', slope=1, intercept=0) +
geom_point(alpha=0.3, aes(color=Category)) +
scale_color_manual(
values=c(
"Other"="#bdbdbd",
"post-txn up"="#6a3d9a",
"post-txn down"="#e31a1c",
"txn"="#b15928"
),
labels=c(
"Other"="Unchanged; n=6928)",
"post-txn up"="Post txn up; n=340",
"post-txn down"="Post txn down; n=628",
"txn"="Txn change; n=544"
),
name="Expression change type"
) +
guides(colour = guide_legend(override.aes = list(alpha = 1))) +
new_scale_color() +
geom_point(
data = Scatter.dat %>%
inner_join(
P.boxplot.dat %>%
dplyr::select(gene, ColorEffect) %>%
filter(ColorEffect == "NMD"),
by="gene"
) %>%
distinct(gene, .keep_all=T),
aes(color='NMD targets predicted from splicing changes'),
fill="#e31a1c",
shape=21
) +
scale_color_manual(values=c('NMD targets predicted from splicing changes'="black"), labels=c("NMD targets predicted from splicing changes"=str_wrap("NMD targets predicted from splicing changes", 20)), name=NULL) +
scale_x_continuous(limits=c(-3,3)) +
scale_y_continuous(limits=c(-3,3)) +
labs(x=expression("chRNA "*log["2"]*"FC"), y=expression("polyA "*log["2"]*"FC"), caption=str_wrap(str_glue("post-txn down 14.9 fold enriched for set of NMD targets predicted from splicing, P<2x10-16, hypergeometric test"),40)) +
coord_fixed() +
theme_bw()
P.scatter
Scatter.dat %>%
filter(gene == "ENSG00000188529")
# A tibble: 1 × 9
gene dose.nM logFC_chRNA logFC_polyA PValue_chRNA PValue_polyA FDR_chRNA
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSG00000… 3160 -0.415 -0.596 0.000936 0.000414 0.00930
# … with 2 more variables: FDR_polyA <dbl>, Category <chr>
first read in some data
ClusterMax.mat <- leafcutter.counts %>%
as.data.frame() %>%
rownames_to_column("junc") %>%
mutate(cluster=str_replace(junc, "^(.+?):.+?:.+?:(.+)$", "\\1_\\2")) %>%
group_by(cluster) %>%
mutate(across(where(is.numeric), sum)) %>%
ungroup() %>%
dplyr::select(junc, everything(), -cluster) %>%
column_to_rownames("junc") %>%
as.matrix()
PSI.df <- (leafcutter.counts / as.numeric(ClusterMax.mat) * 100) %>%
signif() %>%
as.data.frame() %>%
rownames_to_column("Leafcutter.ID") %>%
mutate(Intron = str_replace(Leafcutter.ID, "(.+?):(.+?):(.+?):clu_.+?_([+-])$", "\\1:\\2:\\3:\\4"))
Intron.Donors <- fread("../code/SmallMolecule/leafcutter/JuncfilesMerged.annotated.basic.bed.5ss.tab.gz", col.names = c("Intron", "DonorSeq", "DonorScore")) %>%
mutate(Intron = str_replace(Intron, "(.+?)_(.+?)_(.+?)_(.+?)::.+?$", "\\1:\\2:\\3:\\4"))
PSI.tidy <- PSI.df %>%
left_join(Intron.Donors) %>%
gather("Sample", "PSI",contains("_")) %>%
inner_join(
leafcutter.counts %>%
as.data.frame() %>%
rownames_to_column("Leafcutter.ID") %>%
mutate(Intron = str_replace(Leafcutter.ID, "(.+?):(.+?):(.+?):clu_.+?_([+-])$", "\\1:\\2:\\3:\\4")) %>%
gather("Sample", "Counts",contains("_"))
) %>%
separate(Sample, into=c("treatment", "dose.nM", "Cell.type", "LibraryType", "rep"), sep="_", convert=T) %>%
replace_na(list(dose.nM=0))
Mean.PSI <- PSI.tidy %>%
group_by(Intron, dose.nM, DonorSeq, LibraryType) %>%
summarise(PSI=mean(PSI, na.rm=T)) %>%
ungroup()
inner_join(
Mean.PSI,
Mean.PSI %>%
filter(dose.nM==0) %>%
dplyr::select(Intron, DonorSeq, PSI, LibraryType),
by=c("Intron", "DonorSeq", "LibraryType")
) %>%
mutate(Is.GA.GT = substr(DonorSeq, 3,4)) %>%
mutate(DeltaPSI=(PSI.x-PSI.y)) %>%
group_by(dose.nM, Is.GA.GT, LibraryType) %>%
summarise(PSI.summary = mean(DeltaPSI, na.rm=T)) %>%
ungroup() %>%
drop_na() %>%
ggplot(aes(x=dose.nM, y=PSI.summary, color=Is.GA.GT=="GA", group=interaction(Is.GA.GT, LibraryType))) +
geom_line(aes(linetype=LibraryType)) +
geom_point() +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0)) +
# facet_wrap(~Is.GA.GT, scales = "free_y") +
# scale_y_continuous(trans="log1p") +
theme(axis.text.x = element_text(angle = 45, vjust = 1)) +
labs(y="Mean deltaPSI", caption=str_wrap("Average DeltaPSI across all introns", 30))
Ok now let’s bootstrap some confidence intervals…
dat.to.plot.GAGT.enrichment <- inner_join(
Mean.PSI,
Mean.PSI %>%
filter(dose.nM==0) %>%
dplyr::select(Intron, DonorSeq, PSI, LibraryType),
by=c("Intron", "DonorSeq", "LibraryType")
) %>%
mutate(Is.GA.GT = substr(DonorSeq, 3,4)) %>%
mutate(DinculeotideCategory = case_when(
Is.GA.GT == "AG" ~ "AG|GU",
Is.GA.GT == "GA" ~ "GA|GU",
TRUE ~ "not (AG or GA)|GU"
)) %>%
mutate(DeltaPSI=(PSI.x-PSI.y))
# Bootstrap resamples for confidence intervals
NumIterations <- 200
ResampleResults <- list()
for (i in 1:NumIterations){
df <- dat.to.plot.GAGT.enrichment %>%
group_by(dose.nM, DinculeotideCategory, LibraryType) %>%
sample_frac(replace=T) %>%
ungroup() %>%
group_by(dose.nM, DinculeotideCategory, LibraryType) %>%
summarise(PSI.summary = mean(DeltaPSI, na.rm=T)) %>%
ungroup() %>%
drop_na()
ResampleResults[[i]] <- df
}
dat.to.plot.GAGT.enrichment %>% distinct(Intron, .keep_all=T) %>%
count(DinculeotideCategory)
labels <- c("AG|GU"="AG|GU; n=90011", "GA|GU"="GA|GU; n=3263", "not (AG or GA)|GU"= "not (AG or GA)|GU; n=63548")
P.MeanDeltaPSI <- bind_rows(ResampleResults, .id="iteration") %>%
group_by(dose.nM, DinculeotideCategory, LibraryType) %>%
summarise(enframe(quantile(PSI.summary, c(0.025, 0.5, 0.975)), "ResampledQuantile", "MeanDeltaPSI")) %>%
pivot_wider(names_from="ResampledQuantile", values_from="MeanDeltaPSI") %>%
ggplot(aes(x=dose.nM, y=`50%`, color=DinculeotideCategory,linetype=LibraryType, group=interaction(DinculeotideCategory, LibraryType))) +
geom_ribbon(aes(ymin=`2.5%`, ymax=`97.5%`, fill=DinculeotideCategory), alpha=0.1, size=0.5) +
geom_line(size=1) +
geom_point() +
scale_colour_brewer(type="qual", palette = "Dark2", labels=labels, name="5'ss motif") +
scale_fill_brewer(type="qual", palette = "Dark2", labels=labels, name="5'ss motif") +
# facet_wrap(~Is.GA.GT, scales = "free_y") +
# scale_y_continuous(trans="log1p") +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
# scale_x_break(c(2,2)) +
# theme(axis.text.x.top = element_blank(),
# axis.ticks.x.top = element_blank(),
# axis.line.x.top = element_blank(),
# plot.background = element_blank()) +
Rotate_x_labels +
labs(y=expression("Mean "*Delta*"PSI"), x="[risdiplam] (nanomolar)")
P.MeanDeltaPSI
SpearmanCorr <- read_tsv("../code/SmallMolecule/FitModels/Data/polyA_GAGTIntrons.tsv.gz")
SpearmanCorr %>%
distinct(Intron, .keep_all=T) %>%
filter(substr(DonorSeq, 3,4)=="GA") %>%
filter(q<0.1) %>%
mutate(numberInts = n()) %>%
filter(spearman == min(spearman))
# A tibble: 1 × 16
Intron Leafcutter.ID DonorSeq DonorScore treatment dose.nM Cell.type
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
1 chr19:58571429:… chr19:585714… AGGAGTA… 2.87 DMSO 0 LCL
# … with 9 more variables: LibraryType <chr>, rep <dbl>, PSI <dbl>,
# Counts <dbl>, n <dbl>, spearman <dbl>, spearman.p <dbl>, q <dbl>,
# numberInts <int>
Translated.GAGT_CassetteExons %>% nrow()
[1] 316
P.spearman.dist <- SpearmanCorr %>%
distinct(Intron, .keep_all=T) %>%
mutate(Is.GA.GT = substr(DonorSeq, 3,4)) %>%
mutate(DinculeotideCategory = case_when(
Is.GA.GT == "AG" ~ "AG|GU",
Is.GA.GT == "GA" ~ "GA|GU",
TRUE ~ "not (AG or GA)|GU"
)) %>%
filter(!is.na(spearman)) %>%
# mutate(DinculeotideCategory = factor(DinculeotideCategory, levels=c("GA|GU", "AG|GU", "not (AG or GA)|GU"))) %>%
arrange(desc(DinculeotideCategory)) %>%
ggplot(aes(x=spearman, group=Is.GA.GT, color=DinculeotideCategory)) +
geom_vline(xintercept=0) +
scale_colour_brewer(type="qual", palette = "Dark2", labels=labels, name="5'ss motif") +
stat_ecdf() +
labs(y="Cumulative fraction", x="dose:PSI spearman coef", caption=str_wrap("641 GA|GU induced introns with significant spearman correlation (FDR<10%, corresponding to spearman coef > 0.775) were used for futher interpretion of effects. 305 of which were classified as cassette exons, translated, and PSI was recalculated from junction trios", 40))
P.spearman.dist
##NEEDS FIXING
PolyA.SplicingEffects.Annotationed <- small.molecule.dat.introns %>%
dplyr::select(1:4) %>%
inner_join(
Translated.GAGT_CassetteExons,
by=c("junc"="GAGTInt")
)
P.scatter.byPrediction.dat <- Scatter.dat %>%
filter(dose.nM==3160) %>%
inner_join(
P.boxplot.dat %>%
filter(dataset=="polyA") %>%
dplyr::select(gene, ColorEffect, MergedEffect),
by="gene"
) %>%
inner_join(
PolyA.SplicingEffects.Annotationed %>%
filter(param == "Pred_3160") %>%
dplyr::select(junc, gene, Estimate) %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1")),
by="gene"
)
MergedEffectLabels <- c(
"Productive"="In annotated productive transcript",
"NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame",
"All NMD"="All predicted NMD exons",
"All productive"="All predicted productive exons"
)
P.scatter.byPrediction <- bind_rows(
P.scatter.byPrediction.dat %>%
mutate(color=MergedEffect),
P.scatter.byPrediction.dat %>%
mutate(color=MergedEffect) %>%
mutate(MergedEffect=recode(ColorEffect, !!!c("NMD"="All NMD", "Productive"="All productive")))
) %>%
mutate(MergedEffect = factor(MergedEffect, levels=c("Productive", "Preserves frame", "Cassette in UTR", "All productive","NMD", "Frame shifting", "In-frame PTC", "All NMD"))) %>%
ggplot(aes(x=Estimate, y=logFC_polyA, color=color)) +
geom_hline(yintercept=0) +
geom_point() +
scale_color_manual(values=c(
"Productive"="#1f78b4",
"NMD"="#e31a1c",
"Frame shifting"="#fb9a99",
"Cassette in UTR"="#a6cee3",
"In-frame PTC"="#fb9a99",
"Preserves frame"="#a6cee3"
),
labels=MergedEffectLabels) +
geom_text( data = . %>%
group_by(MergedEffect) %>%
summarise(cor=cor.test(Estimate,logFC_polyA)[["estimate"]], pval=cor.test(Estimate,logFC_polyA, method='s')[["p.value"]]) %>%
mutate(R = signif(cor, 3), P=format.pval(pval, 3)) %>%
mutate(label = str_glue("rho:{R}\nP:{P}")),
aes(x=-Inf, y=-Inf, label=label),
hjust=-.1, vjust=-0.1, color='black'
) +
# facet_wrap(~MergedEffect, labeller = as_labeller(MergedEffectLabels)) +
facet_wrap(~MergedEffect, nrow=2, labeller = as_labeller(MergedEffectLabels, default=label_wrap_gen(20))) +
theme(legend.position="none") +
labs(y=expression("host gene "*log["2"]*"FC, polyA RNA"), x=expression(Delta*"PSI"), caption="[risdiplam]=3160nM, Effects measured in polyA RNA") +
Rotate_x_labels
P.scatter.byPrediction
Same thing but use expression effects in chRNA.
P.scatter.byPrediction.dat.chRNA <- Scatter.dat %>%
filter(dose.nM==3160) %>%
inner_join(
P.boxplot.dat %>%
filter(dataset=="polyA") %>%
dplyr::select(gene, ColorEffect, MergedEffect),
by="gene"
) %>%
inner_join(
PolyA.SplicingEffects.Annotationed %>%
filter(param=="Pred_3160") %>%
dplyr::select(junc, gene, Estimate) %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1")),
by="gene"
)
MergedEffectLabels <- c(
"Productive"="In annotated productive transcript",
"NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame",
"All NMD"="All predicted NMD exons",
"All productive"="All predicted productive exons"
)
P.scatter.byPrediction.chRNA <- bind_rows(
P.scatter.byPrediction.dat.chRNA %>%
mutate(color=MergedEffect),
P.scatter.byPrediction.dat.chRNA %>%
mutate(color=MergedEffect) %>%
mutate(MergedEffect=recode(ColorEffect, !!!c("NMD"="All NMD", "Productive"="All productive")))
) %>%
mutate(MergedEffect = factor(MergedEffect, levels=c("Productive", "Preserves frame", "Cassette in UTR", "All productive","NMD", "Frame shifting", "In-frame PTC", "All NMD"))) %>%
ggplot(aes(x=Estimate, y=logFC_chRNA, color=color)) +
geom_hline(yintercept=0) +
geom_point() +
scale_color_manual(values=c(
"Productive"="#1f78b4",
"NMD"="#e31a1c",
"Frame shifting"="#fb9a99",
"Cassette in UTR"="#a6cee3",
"In-frame PTC"="#fb9a99",
"Preserves frame"="#a6cee3"
),
labels=MergedEffectLabels) +
geom_text( data = . %>%
group_by(MergedEffect) %>%
summarise(cor=cor.test(Estimate,logFC_chRNA)[["estimate"]], pval=cor.test(Estimate,logFC_chRNA, method='s')[["p.value"]]) %>%
mutate(R = signif(cor, 3), P=format.pval(pval, 3)) %>%
mutate(label = str_glue("rho:{R}\nP:{P}")),
aes(x=-Inf, y=Inf, label=label),
hjust=-.1, vjust=1.1, color='black'
) +
# facet_wrap(~MergedEffect, labeller = as_labeller(MergedEffectLabels)) +
facet_wrap(~MergedEffect, nrow=2, labeller = as_labeller(MergedEffectLabels, default=label_wrap_gen(20))) +
theme(legend.position="none") +
labs(y=expression("host gene "*log["2"]*"FC, chRNA"), x=expression(Delta*"PSI"), caption="[risdiplam]=3160nM, Splicing measured in polyA") +
Rotate_x_labels
P.scatter.byPrediction.chRNA
SplicingDruggableGenes <- P.boxplot.dat %>%
filter(dataset == 'polyA') %>%
filter(ColorEffect == 'NMD') %>%
distinct(gene, .keep_all=T)
nrow(SplicingDruggableGenes)
[1] 219
exons <- read_tsv("../code/ReferenceGenome/Annotations/GTFTools_BasicAnnotations/gencode.v34.chromasomal.exons.sorted.bed", col_names=c("chrom", "start", "stop", "gene_transcript", "score", "strand"))
transcripts_to_genes <- exons %>%
distinct(gene_transcript) %>%
separate(gene_transcript, into=c("gene", "transcript"), sep="_") %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1"))
Expression.table.transcripts <- read_tsv("../code/SmallMolecule/salmon.DMSO.merged.txt")
Expression.table.genes <- Expression.table.transcripts %>%
inner_join(transcripts_to_genes, by=c("Name"="transcript")) %>%
group_by(gene) %>%
summarise_at(vars(contains("DMSO")), sum) %>%
gather("sample", "TPM", -gene) %>%
group_by(gene) %>%
summarise(medianTPM = median(TPM))
hist(log2(Expression.table.genes$medianTPM))
Expression.table.genes %>%
filter(gene == "ENSG00000111640")
# A tibble: 1 × 2
gene medianTPM
<chr> <dbl>
1 ENSG00000111640 4502.
Expression.tidyForMatchingGenes <- Expression.table.genes %>%
mutate(IsSplicingDruggable = gene %in% SplicingDruggableGenes$gene) %>%
arrange(medianTPM) %>%
mutate(LaggingGeneGroup = lag(IsSplicingDruggable)) %>%
mutate(LeadingGeneGroup = lead(IsSplicingDruggable)) %>%
ungroup()
Merged.WithExpressionMatchedControlGenes <-
bind_rows(
Expression.tidyForMatchingGenes %>%
filter(IsSplicingDruggable) %>%
mutate(Group = "SplicingPerturbed"),
Expression.tidyForMatchingGenes %>%
filter(!IsSplicingDruggable & LaggingGeneGroup) %>%
mutate(Group = "SplicingUnperturbedControl"),
Expression.tidyForMatchingGenes %>%
filter(!IsSplicingDruggable & LeadingGeneGroup) %>%
mutate(Group = "SplicingUnperturbedControl"),
) %>%
dplyr::select(gene, medianTPM, Group)
Merged.WithExpressionMatchedControlGenes %>%
ggplot(aes(medianTPM)) +
stat_ecdf() +
scale_x_continuous(trans='log10') +
facet_wrap(~Group)
Now download the supplemental table from This paper…
url1<-'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6321762/bin/NIHMS80906-supplement-Table_S1.xlsx'
p1f <- tempfile()
download.file(url1, p1f, mode="wb")
p1<-read_excel(path = p1f, sheet = 1)
head(p1)
# A tibble: 6 × 12
ensembl_gene_id druggability_tier hgnc_names chr_b37 start_b37 end_b37 strand
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 ENSG00000000938 Tier 1 FGR 1 27938575 2.80e7 -1
2 ENSG00000001626 Tier 1 CFTR 7 117105838 1.17e8 1
3 ENSG00000001630 Tier 1 CYP51A1 7 91741465 9.18e7 -1
4 ENSG00000002549 Tier 1 LAP3 4 17578815 1.76e7 1
5 ENSG00000004468 Tier 1 CD38 4 15779898 1.59e7 1
6 ENSG00000004478 Tier 1 FKBP4 12 2904119 2.91e6 1
# … with 5 more variables: description <chr>, no_of_gwas_regions <dbl>,
# small_mol_druggable <chr>, bio_druggable <chr>, adme_gene <chr>
Also, make sure to subset all gene sets to only include protein coding ones.
CodingGenes <- read_tsv("../data/mart_export.txt.gz") %>%
distinct(`Gene stable ID`) %>%
dplyr::select(gene = "Gene stable ID")
GeneCategories <- p1 %>%
filter(small_mol_druggable=="Y") %>%
left_join(Expression.table.genes, by=c("ensembl_gene_id"="gene")) %>%
dplyr::select(Group=druggability_tier, gene=ensembl_gene_id, medianTPM) %>%
bind_rows(
Merged.WithExpressionMatchedControlGenes,
Expression.table.genes %>%
mutate(Group = "Whole protein coding genome")
) %>%
filter(gene %in% CodingGenes$gene)
Now check number of exons and gene length of different druggability categories
TopTranscriptPerGene <- Expression.table.transcripts %>%
# # summarise_at(vars(contains("DMSO")), sum) %>%
gather("sample", "TPM", -Name) %>%
group_by(Name) %>%
summarise(medianTPM = median(TPM)) %>%
inner_join(transcripts_to_genes, by=c("Name"="transcript")) %>%
group_by(gene) %>%
filter(medianTPM == max(medianTPM)) %>%
ungroup() %>%
distinct(gene, .keep_all=T)
NumExonsAndLength <- exons %>%
group_by(gene_transcript) %>%
summarise(NumExons = n(),
Min = min(start),
Max = max(stop)) %>%
mutate(GeneLength = Max - Min) %>%
separate(gene_transcript, into=c("gene", "transcript"), sep="_") %>%
filter(transcript %in% TopTranscriptPerGene$Name) %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1")) %>%
dplyr::select(-Min, -Max)
NumExonsAndLength %>%
filter(gene %in% CodingGenes$gene) %>%
ggplot(aes(x=GeneLength)) +
stat_ecdf() +
coord_cartesian(xlim=c(1E3, 1E6)) +
scale_x_continuous(trans='log10') +
annotation_logticks() +
theme_bw()
GeneCategories %>%
left_join(NumExonsAndLength) %>%
filter(!Group %in% c("Tier 2", "Tier 3A", "Tier 3B")) %>%
gather(key="Feature", value="value", NumExons, GeneLength, medianTPM) %>%
ggplot(aes(x=value, color=Group)) +
stat_ecdf() +
scale_x_continuous(trans='log10') +
annotation_logticks() +
facet_wrap(~Feature, scales = "free", ncol=1) +
labs(y="ecdf", color="Druggability gene group")
GeneCategories %>%
left_join(NumExonsAndLength) %>%
gather(key="Feature", value="value", NumExons, GeneLength, medianTPM) %>%
filter(Feature == "NumExons") %>%
ggplot(aes(x=value, color=Group)) +
stat_ecdf() +
scale_x_continuous(trans='log10') +
annotation_logticks() +
facet_wrap(~Feature, scales = "free", ncol=1) +
labs(y="ecdf", color="Druggability gene group")
Save plots for number of exons and host intron length
GeneCategories %>%
left_join(NumExonsAndLength) %>%
gather(key="Feature", value="value", NumExons, GeneLength, medianTPM) %>%
filter(Feature == "NumExons") %>%
group_by(Group, Feature) %>%
summarise(Med = median(value, na.rm=T))
# A tibble: 7 × 3
# Groups: Group [7]
Group Feature Med
<chr> <chr> <dbl>
1 SplicingPerturbed NumExons 14
2 SplicingUnperturbedControl NumExons 9
3 Tier 1 NumExons 11
4 Tier 2 NumExons 11
5 Tier 3A NumExons 13
6 Tier 3B NumExons 6
7 Whole protein coding genome NumExons 8
Introns.Annotated.Lengths <- read_tsv("../data/IntronAnnotationsFromYang.tsv.gz") %>%
mutate(gene = str_replace(gene, "^(.+?)\\..+$", "\\1")) %>%
filter(SemiSupergroupAnnotations == "basic tag") %>%
inner_join(
GeneCategories %>%
filter(Group == "SplicingPerturbed")) %>%
mutate(Len = end - start) %>%
mutate(Group = "AllOtherIntronInGenes") %>%
dplyr::select(Len, Group)
HostIntrons <- read_tsv("../output/SmallMoleculeGAGT_CassetteExonclusters.bed", col_names=paste0("V", 1:9)) %>%
filter(str_detect(V4, "junc.skipping")) %>%
mutate(Len = V3-V2) %>%
mutate(Group = "Host introns") %>%
dplyr::select(Len, Group)
bind_rows(
HostIntrons, Introns.Annotated.Lengths
) %>%
wilcox.test(Len ~ Group, data=.)
Wilcoxon rank sum test with continuity correction
data: Len by Group
W = 564921, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
bind_rows(
HostIntrons, Introns.Annotated.Lengths
) %>%
group_by(Group) %>%
summarise(medLen = median(Len))
# A tibble: 2 × 2
Group medLen
<chr> <dbl>
1 AllOtherIntronInGenes 2232.
2 Host introns 6138.
P.IntLen <- bind_rows(
HostIntrons, Introns.Annotated.Lengths
) %>%
ggplot(aes(x=Len, color=Group)) +
stat_ecdf() +
scale_x_continuous(trans='log10') +
coord_cartesian(xlim=c(100, 1E5)) +
scale_color_manual(values=c("Host introns"="#e31a1c", "AllOtherIntronInGenes"="#969696")) +
labs(x="Intron length (bp)", y="Cumulative fraction", caption="2233 vs 6139bp medians, P<2.2E-16")
P.IntLen
GeneCategories$Group %>% unique()
[1] "Tier 1" "Tier 2"
[3] "Tier 3A" "Tier 3B"
[5] "SplicingPerturbed" "SplicingUnperturbedControl"
[7] "Whole protein coding genome"
GeneCategories %>%
left_join(NumExonsAndLength) %>%
gather(key="Feature", value="value", NumExons, GeneLength, medianTPM) %>%
filter(Feature == "NumExons") %>%
filter(Group %in% c("SplicingPerturbed", "SplicingUnperturbedControl")) %>%
wilcox.test(value~Group, data=.)
Wilcoxon rank sum test with continuity correction
data: value by Group
W = 57906, p-value = 5.186e-12
alternative hypothesis: true location shift is not equal to 0
P.NumExons <-
GeneCategories %>%
left_join(NumExonsAndLength) %>%
gather(key="Feature", value="value", NumExons, GeneLength, medianTPM) %>%
filter(Feature == "NumExons") %>%
filter(!Group %in% c("Tier 2", "Tier 3A", "Tier 3B", "Tier 1")) %>%
ggplot(aes(x=value, color=Group)) +
stat_ecdf() +
# scale_x_continuous(trans='log10') +
# annotation_logticks(sides='b') +
coord_cartesian(xlim=c(1, 30)) +
scale_color_manual(
labels=c(
"SplicingPerturbed"="219 Risdiplam-induced NMD targets",
"SplicingUnperturbedControl"="389 expression-matched genes",
"Whole protein coding genome"="Whole genome",
"Tier 1"="1059 targets of approved small molecules"),
values=c(
"SplicingPerturbed"="#e31a1c",
"SplicingUnperturbedControl"="#1f78b4",
"Whole protein coding genome"="#000000",
"Tier 1"="#737373")
) +
labs(y="Cumulative fraction", color="Druggability gene group", x="Number exons in gene", caption="median 14 vs 9, P=5.2x10-12")
P.NumExons
Ok, that is one way to show that the risdiplam-perturbed genes tend to be in longer genes with more exons, compared to a matched set of control genes or compared to previously druggable genes. But for sake of compactness, since I already know I also want to do an enrichment analysis including GO categories (ie GPCRs, etc), let’s also look make gene categories by length and present for enrichment alongside the GO enrichment analysis…
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
ensembl_to_symbols <- getBM(attributes= c("ensembl_gene_id","hgnc_symbol"),mart=ensembl)
#Manually copy paste links from MsigDB
go=c(
"GO:0015075"="https://www.gsea-msigdb.org/gsea/msigdb/human/download_geneset.jsp?geneSetName=GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY&fileType=grp",
"GO:0004930"="https://www.gsea-msigdb.org/gsea/msigdb/human/download_geneset.jsp?geneSetName=GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY&fileType=grp",
"GO:0004984"="https://www.gsea-msigdb.org/gsea/msigdb/human/download_geneset.jsp?geneSetName=GOMF_OLFACTORY_RECEPTOR_ACTIVITY&fileType=grp",
"GO:0004879"="https://www.gsea-msigdb.org/gsea/msigdb/human/download_geneset.jsp?geneSetName=GOMF_NUCLEAR_RECEPTOR_ACTIVITY&fileType=grp",
"GO:0016301"="https://www.gsea-msigdb.org/gsea/msigdb/human/download_geneset.jsp?geneSetName=GOMF_KINASE_ACTIVITY&fileType=grp",
"GO:0003700"="https://www.gsea-msigdb.org/gsea/msigdb/human/download_geneset.jsp?geneSetName=GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY&fileType=grp")
GO.genes.of.interest <- lapply(go, read.table, skip=2, col.names=c("hgnc_symbol")) %>%
bind_rows(.id="GO") %>%
mutate(GO_Name = recode(GO, !!!go)) %>%
mutate(GO_Name = str_replace(GO_Name, ".+?geneSetName=(.+?)&fileType=grp$", "\\1")) %>%
left_join(ensembl_to_symbols)
GO.genes.of.interest %>%
distinct(hgnc_symbol, GO_Name) %>%
count(GO_Name)
GO_Name n
1 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY 1454
2 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY 871
3 GOMF_KINASE_ACTIVITY 743
4 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 726
5 GOMF_NUCLEAR_RECEPTOR_ACTIVITY 52
6 GOMF_OLFACTORY_RECEPTOR_ACTIVITY 429
GPCRs <- GO.genes.of.interest %>%
filter(GO_Name == "GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY") %>%
pull(hgnc_symbol) %>% unique() %>%
setdiff(
GO.genes.of.interest %>%
filter(GO_Name == "GOMF_OLFACTORY_RECEPTOR_ACTIVITY") %>%
pull(hgnc_symbol) %>% unique()
)
Now let’s do the GO enrichment tests. I’ll actually just do them “by
hand” with the fisher.test
(hypergeometric test) function
rather than using a dedicated package to systematically look for
enrichment across all GO categories, which I am less interested in.
library(broom)
GO.genes.of.interest.totest <-
bind_rows(
# filter out olfactor GPCRs from GPCR group
GO.genes.of.interest %>%
filter(ensembl_gene_id %in% CodingGenes$gene) %>%
filter(!GO_Name=="GOMF_OLFACTORY_RECEPTOR_ACTIVITY") %>%
filter(!(GO_Name=="GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY" & !hgnc_symbol %in% GPCRs)),
NumExonsAndLength %>%
filter(gene %in% CodingGenes$gene) %>%
# mutate(GO_Name = cut_number(GeneLength, 4, ordered_result=T)) %>%
# distinct(GO_Name)
mutate(GO_Name = factor(ntile(GeneLength, 4))) %>%
# mutate(GO_Name = cut(GeneLength, breaks=c(-Inf,1E3, 1E4, 1E5, Inf), include.lowest=T)) %>%
filter(!is.na(GO_Name)) %>%
dplyr::select(ensembl_gene_id=gene, GO_Name)
)
GO_Categories <- GO.genes.of.interest.totest %>%
pull(GO_Name) %>% unique()
count(GO.genes.of.interest.totest, GO_Name)
GO_Name n
1 1 4621
2 2 4621
3 3 4621
4 4 4621
5 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY 1537
6 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY 489
7 GOMF_KINASE_ACTIVITY 793
8 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 778
9 GOMF_NUCLEAR_RECEPTOR_ACTIVITY 57
DruggabilityCategories <- GeneCategories %>%
filter(!Group=="Whole protein coding genome") %>% pull(Group) %>% unique()
DruggabilityCategories
[1] "Tier 1" "Tier 2"
[3] "Tier 3A" "Tier 3B"
[5] "SplicingPerturbed" "SplicingUnperturbedControl"
GeneCategories.totest <- GeneCategories %>%
sample_frac() %>%
group_by(Group) %>%
mutate(n = row_number()) %>%
ungroup()
# filter(!Group == "Tier 1" | (Group == "Tier 1" & n <= 116))
count(GeneCategories.totest, Group)
# A tibble: 7 × 2
Group n
<chr> <int>
1 SplicingPerturbed 219
2 SplicingUnperturbedControl 396
3 Tier 1 1059
4 Tier 2 648
5 Tier 3A 67
6 Tier 3B 516
7 Whole protein coding genome 18484
NumExonsAndLength %>%
filter(gene %in% CodingGenes$gene) %>%
mutate(N = cut_number(GeneLength, 4)) %>%
distinct(N)
# A tibble: 4 × 1
N
<fct>
1 (8.21e+03,2.25e+04]
2 (2.25e+04,5.88e+04]
3 (5.88e+04,2.3e+06]
4 [117,8.21e+03]
results <- list()
for (DruggabilitySetName in DruggabilityCategories){
for (GO_CategoryName in GO_Categories){
print(paste(DruggabilitySetName, GO_CategoryName))
DruggabilitySet <- GeneCategories.totest %>%
filter(Group==DruggabilitySetName) %>% pull(gene)
GO_CategorySet <- GO.genes.of.interest.totest %>%
filter(GO_Name==GO_CategoryName) %>% pull(ensembl_gene_id)
test.results <- data.frame(gene=CodingGenes$gene) %>%
mutate(IsDruggable = gene %in% DruggabilitySet, IsInGO.Set = gene %in% GO_CategorySet) %>%
mutate(IsDruggable = factor(IsDruggable),
IsInGO.Set = factor(IsInGO.Set)) %>%
count(IsDruggable, IsInGO.Set, .drop=F) %>%
pivot_wider(names_from="IsInGO.Set", values_from="n") %>%
column_to_rownames("IsDruggable") %>%
fisher.test() %>% glance()
results[[paste(DruggabilitySetName, GO_CategoryName, sep=";")]] <- test.results %>%
as.data.frame()
}
}
[1] "Tier 1 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 1 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 1 GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 1 GOMF_KINASE_ACTIVITY"
[1] "Tier 1 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 1 2"
[1] "Tier 1 3"
[1] "Tier 1 4"
[1] "Tier 1 1"
[1] "Tier 2 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 2 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 2 GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 2 GOMF_KINASE_ACTIVITY"
[1] "Tier 2 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 2 2"
[1] "Tier 2 3"
[1] "Tier 2 4"
[1] "Tier 2 1"
[1] "Tier 3A GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 3A GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 3A GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 3A GOMF_KINASE_ACTIVITY"
[1] "Tier 3A GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 3A 2"
[1] "Tier 3A 3"
[1] "Tier 3A 4"
[1] "Tier 3A 1"
[1] "Tier 3B GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 3B GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 3B GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 3B GOMF_KINASE_ACTIVITY"
[1] "Tier 3B GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 3B 2"
[1] "Tier 3B 3"
[1] "Tier 3B 4"
[1] "Tier 3B 1"
[1] "SplicingPerturbed GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "SplicingPerturbed GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "SplicingPerturbed GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "SplicingPerturbed GOMF_KINASE_ACTIVITY"
[1] "SplicingPerturbed GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "SplicingPerturbed 2"
[1] "SplicingPerturbed 3"
[1] "SplicingPerturbed 4"
[1] "SplicingPerturbed 1"
[1] "SplicingUnperturbedControl GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_KINASE_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "SplicingUnperturbedControl 2"
[1] "SplicingUnperturbedControl 3"
[1] "SplicingUnperturbedControl 4"
[1] "SplicingUnperturbedControl 1"
results %>%
bind_rows(.id="Druggability_GO") %>%
separate(Druggability_GO, into=c("Druggability", "GO category"), sep=";") %>%
mutate(GO = recode(`GO category`, "GOMF_KINASE_ACTIVITY"="Kinases", "GOMF_NUCLEAR_RECEPTOR_ACTIVITY"="Nuc receptors", "GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"="GPCRs", "GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"="Ion transporters", "GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"="TFs")) %>%
# filter(!GO=="TFs") %>%
# pull(GO) %>% unique()
# mutate(GO = factor(GO, levels=c("1-10kb", "10-50kb", "50-100kb", ">100kb", "GPCRs", "Nuc receptors", "Kinases", "Ion transporters", "TFs", "BrainEnriched_HumanProteinAtlas"))) %>%
filter(Druggability %in% c("Tier 1", "SplicingPerturbed")) %>%
mutate(Druggability = recode(Druggability, "Tier 1"="1059 targets of approved small molecules", "SplicingPerturbed"="219 Risdiplam-induced NMD targets")) %>%
ggplot(aes(x=GO, y=log2(estimate), fill=GO)) +
geom_col() +
geom_errorbar(aes(ymin=log2(conf.low), ymax=log2(conf.high)), width=.2) +
geom_hline(yintercept = 0) +
Rotate_x_labels +
scale_fill_manual(
values=c(
"1"="#b2e2e2",
"2"="#66c2a4",
"3"="#2ca25f",
"4"="#006d2c",
"GPCRs"="#ffffb3",
"Ion transporters"="#bebada",
"Kinases"="#fb8072",
"Nuc receptors"="#80b1d3",
"TFs"="#fdb462"),
labels=c(
"1"="Q1 gene length; <8.2kb",
"2"="Q2 gene length; 8-23kb",
"3"="Q3 gene length; 23-59kb",
"4"="Q4 gene length; >59kb",
"GPCRs"="GPCRs",
"Ion transporters",
"Kinases",
"Nuc receptors"="Nuclear receptors",
"TFs"="Transcription factors"
)) +
# scale_y_continuous(trans="log2") +
facet_wrap(~Druggability, labeller = label_wrap_gen(25)) +
labs(y="Enrichment; log2(OddsRatio)", x="Gene category", fill="Gene category")
Ok, I think I want to repeat that now but also add two more facets: one for “disease genes” (ie OMIM), and another facet for the larger set of post-txn regulated genes. First let’s explore the OMIM dataset quickly…
OMIM <- read_tsv("/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/OMIM/genemap2.txt", skip=3)
#Just dominant genes, search of Huntington's HTT gene, an expected result
OMIM %>%
filter(!is.na(Phenotypes)) %>%
filter(!is.na(`Ensembl Gene ID`)) %>%
filter(str_detect(Phenotypes, "dominant")) %>%
filter(str_detect(Phenotypes, "untington"))
# A tibble: 3 × 14
`# Chromosome` `Genomic Position Start` `Genomic Position End` `Cyto Location`
<chr> <dbl> <dbl> <chr>
1 chr4 3074680 3243959 4p16.3
2 chr16 87601834 87698155 16q24.3
3 chr20 4686455 4701587 20pter-p12
# … with 10 more variables: `Computed Cyto Location` <chr>, `MIM Number` <dbl>,
# `Gene Symbols` <chr>, `Gene Name` <chr>, `Approved Gene Symbol` <chr>,
# `Entrez Gene ID` <dbl>, `Ensembl Gene ID` <chr>, Comments <chr>,
# Phenotypes <chr>, `Mouse Gene Symbol/ID` <chr>
# How many dominant OMIM genes are there?
OMIM %>%
filter(!is.na(Phenotypes)) %>%
filter(!is.na(`Ensembl Gene ID`)) %>%
filter(str_detect(Phenotypes, "dominant")) %>%
distinct(`Ensembl Gene ID`) %>%
nrow()
[1] 1834
GO.genes.of.interest.totest <-
bind_rows(
# filter out olfactor GPCRs from GPCR group
GO.genes.of.interest %>%
filter(ensembl_gene_id %in% CodingGenes$gene) %>%
filter(!GO_Name=="GOMF_OLFACTORY_RECEPTOR_ACTIVITY") %>%
filter(!(GO_Name=="GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY" & !hgnc_symbol %in% GPCRs)),
NumExonsAndLength %>%
filter(gene %in% CodingGenes$gene) %>%
# mutate(GO_Name = cut_number(GeneLength, 5, ordered_result=T)) %>%
mutate(GO_Name = factor(ntile(GeneLength, 4))) %>%
# mutate(GO_Name = cut(GeneLength, breaks=c(-Inf,1E3, 1E4, 1E5, Inf), include.lowest=T)) %>%
filter(!is.na(GO_Name)) %>%
dplyr::select(ensembl_gene_id=gene, GO_Name),
OMIM %>%
filter(!is.na(Phenotypes)) %>%
filter(!is.na(`Ensembl Gene ID`)) %>%
filter(str_detect(Phenotypes, "dominant")) %>%
dplyr::select(ensembl_gene_id = `Ensembl Gene ID`) %>%
distinct() %>%
filter(ensembl_gene_id %in% CodingGenes$gene) %>%
mutate(GO_Name = "OMIM_DomNegative")
)
GO_Categories <- GO.genes.of.interest.totest %>%
pull(GO_Name) %>% unique()
count(GO.genes.of.interest.totest, GO_Name)
GO_Name n
1 1 4621
2 2 4621
3 3 4621
4 4 4621
5 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY 1537
6 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY 489
7 GOMF_KINASE_ACTIVITY 793
8 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY 778
9 GOMF_NUCLEAR_RECEPTOR_ACTIVITY 57
10 OMIM_DomNegative 1823
DruggabilityCategories.totest.df <- GeneCategories %>%
sample_frac() %>%
group_by(Group) %>%
mutate(n = row_number()) %>%
ungroup() %>%
# filter(!Group == "Tier 1" | (Group == "Tier 1" & n <= 116))
bind_rows(
Scatter.dat %>%
filter(Category == "post-txn down") %>%
filter(gene %in% CodingGenes$gene) %>%
dplyr::select(gene) %>%
mutate(Group = "Larger set of risdiplam-effected"),
OMIM %>%
filter(!is.na(Phenotypes)) %>%
filter(!is.na(`Ensembl Gene ID`)) %>%
filter(str_detect(Phenotypes, "dominant")) %>%
dplyr::select(gene = `Ensembl Gene ID`) %>%
distinct() %>%
filter(gene %in% CodingGenes$gene) %>%
mutate(Group = "OMIM_DomNegative")
)
count(DruggabilityCategories.totest.df, Group)
# A tibble: 9 × 2
Group n
<chr> <int>
1 Larger set of risdiplam-effected 627
2 OMIM_DomNegative 1823
3 SplicingPerturbed 219
4 SplicingUnperturbedControl 396
5 Tier 1 1059
6 Tier 2 648
7 Tier 3A 67
8 Tier 3B 516
9 Whole protein coding genome 18484
DruggabilityCategories <- DruggabilityCategories.totest.df %>%
filter(!Group=="Whole protein coding genome") %>% pull(Group) %>% unique()
DruggabilityCategories
[1] "Tier 3B" "Tier 1"
[3] "Tier 2" "SplicingUnperturbedControl"
[5] "SplicingPerturbed" "Tier 3A"
[7] "Larger set of risdiplam-effected" "OMIM_DomNegative"
results <- list()
for (DruggabilitySetName in DruggabilityCategories){
for (GO_CategoryName in GO_Categories){
print(paste(DruggabilitySetName, GO_CategoryName))
DruggabilitySet <- DruggabilityCategories.totest.df %>%
filter(Group==DruggabilitySetName) %>% pull(gene)
GO_CategorySet <- GO.genes.of.interest.totest %>%
filter(GO_Name==GO_CategoryName) %>% pull(ensembl_gene_id)
test.results <- data.frame(gene=CodingGenes$gene) %>%
mutate(IsDruggable = gene %in% DruggabilitySet, IsInGO.Set = gene %in% GO_CategorySet) %>%
mutate(IsDruggable = factor(IsDruggable),
IsInGO.Set = factor(IsInGO.Set)) %>%
count(IsDruggable, IsInGO.Set, .drop=F) %>%
pivot_wider(names_from="IsInGO.Set", values_from="n") %>%
column_to_rownames("IsDruggable") %>%
fisher.test() %>% glance()
results[[paste(DruggabilitySetName, GO_CategoryName, sep=";")]] <- test.results %>%
as.data.frame()
}
}
[1] "Tier 3B GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 3B GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 3B GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 3B GOMF_KINASE_ACTIVITY"
[1] "Tier 3B GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 3B 2"
[1] "Tier 3B 3"
[1] "Tier 3B 4"
[1] "Tier 3B 1"
[1] "Tier 3B OMIM_DomNegative"
[1] "Tier 1 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 1 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 1 GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 1 GOMF_KINASE_ACTIVITY"
[1] "Tier 1 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 1 2"
[1] "Tier 1 3"
[1] "Tier 1 4"
[1] "Tier 1 1"
[1] "Tier 1 OMIM_DomNegative"
[1] "Tier 2 GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 2 GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 2 GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 2 GOMF_KINASE_ACTIVITY"
[1] "Tier 2 GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 2 2"
[1] "Tier 2 3"
[1] "Tier 2 4"
[1] "Tier 2 1"
[1] "Tier 2 OMIM_DomNegative"
[1] "SplicingUnperturbedControl GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_KINASE_ACTIVITY"
[1] "SplicingUnperturbedControl GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "SplicingUnperturbedControl 2"
[1] "SplicingUnperturbedControl 3"
[1] "SplicingUnperturbedControl 4"
[1] "SplicingUnperturbedControl 1"
[1] "SplicingUnperturbedControl OMIM_DomNegative"
[1] "SplicingPerturbed GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "SplicingPerturbed GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "SplicingPerturbed GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "SplicingPerturbed GOMF_KINASE_ACTIVITY"
[1] "SplicingPerturbed GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "SplicingPerturbed 2"
[1] "SplicingPerturbed 3"
[1] "SplicingPerturbed 4"
[1] "SplicingPerturbed 1"
[1] "SplicingPerturbed OMIM_DomNegative"
[1] "Tier 3A GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Tier 3A GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Tier 3A GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Tier 3A GOMF_KINASE_ACTIVITY"
[1] "Tier 3A GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Tier 3A 2"
[1] "Tier 3A 3"
[1] "Tier 3A 4"
[1] "Tier 3A 1"
[1] "Tier 3A OMIM_DomNegative"
[1] "Larger set of risdiplam-effected GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "Larger set of risdiplam-effected GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "Larger set of risdiplam-effected GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "Larger set of risdiplam-effected GOMF_KINASE_ACTIVITY"
[1] "Larger set of risdiplam-effected GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "Larger set of risdiplam-effected 2"
[1] "Larger set of risdiplam-effected 3"
[1] "Larger set of risdiplam-effected 4"
[1] "Larger set of risdiplam-effected 1"
[1] "Larger set of risdiplam-effected OMIM_DomNegative"
[1] "OMIM_DomNegative GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"
[1] "OMIM_DomNegative GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"
[1] "OMIM_DomNegative GOMF_NUCLEAR_RECEPTOR_ACTIVITY"
[1] "OMIM_DomNegative GOMF_KINASE_ACTIVITY"
[1] "OMIM_DomNegative GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"
[1] "OMIM_DomNegative 2"
[1] "OMIM_DomNegative 3"
[1] "OMIM_DomNegative 4"
[1] "OMIM_DomNegative 1"
[1] "OMIM_DomNegative OMIM_DomNegative"
results %>%
bind_rows(.id="Druggability_GO") %>%
separate(Druggability_GO, into=c("Druggability", "GO category"), sep=";") %>%
mutate(GO = recode(`GO category`, "GOMF_KINASE_ACTIVITY"="Kinases", "GOMF_NUCLEAR_RECEPTOR_ACTIVITY"="Nuc receptors", "GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"="GPCRs", "GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"="Ion transporters", "GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"="TFs")) %>%
# filter(!GO=="TFs") %>%
# pull(GO) %>% unique()
# mutate(GO = factor(GO, levels=c("1-10kb", "10-50kb", "50-100kb", ">100kb", "GPCRs", "Nuc receptors", "Kinases", "Ion transporters", "TFs", "BrainEnriched_HumanProteinAtlas"))) %>%
filter(Druggability %in% c("Tier 1", "SplicingPerturbed", "Larger set of risdiplam-effected", "OMIM_DomNegative")) %>%
mutate(Druggability = recode(Druggability, "Tier 1"="1059 targets of approved small molecules", "SplicingPerturbed"="219 Risdiplam-induced NMD targets", "Larger set of risdiplam-effected"="627 post-txn down genes")) %>%
ggplot(aes(x=GO, y=log2(estimate), fill=GO)) +
geom_col() +
geom_errorbar(aes(ymin=log2(conf.low), ymax=log2(conf.high)), width=.2) +
geom_hline(yintercept = 0) +
scale_y_continuous(limits=c(-5,5)) +
Rotate_x_labels +
scale_fill_manual(
values=c(
"1"="#b2e2e2",
"2"="#66c2a4",
"3"="#2ca25f",
"4"="#006d2c",
"GPCRs"="#ffffb3",
"Ion transporters"="#bebada",
"Kinases"="#fb8072",
"Nuc receptors"="#80b1d3",
"TFs"="#fdb462",
"OMIM_DomNegative"="#737373"),
labels=c(
"1"="Q1 gene length; <8.2kb",
"2"="Q2 gene length; 8-23kb",
"3"="Q3 gene length; 23-59kb",
"4"="Q4 gene length; >59kb",
"GPCRs"="GPCRs",
"Ion transporters",
"Kinases",
"Nuc receptors"="Nuclear receptors",
"TFs"="Transcription factors",
"OMIM_DomNegative"="OMIM_DomNegative"
)) +
# scale_y_continuous(trans="log2") +
facet_wrap(~Druggability, labeller = label_wrap_gen(25)) +
labs(y="Enrichment; log2(OddsRatio)", x="Gene category", fill="Gene category")
Ok, so the conclusion is that risdiplam targets are particularly enriched in long genes, and not particularly enriched in GPCRs, kinases, etc. However, interestingly, the small molecule targets are also somewhat enriched in long genes, though not to the extent of risdiplam targets. Looking at OMIM dominant negative genes they are enriched in OMIM doinant negative genes (obviously circular), and perhaps as expected the small molecule targets are also enriched in them, and interestingly risdiplam targets are slightly enriched. But also note that the dominant negative OMIM genes are also slightly enriched in long genes and gpcrs, kinases, etc. So in a way the existing small molecule set is already more ‘matched’ for OMIM disease gene types… Maybe a different way to present this is just as a stacked bar plots with different categories…
But first let’s save a version of this plot for final fig.
P.genesetEnrichment <- results %>%
bind_rows(.id="Druggability_GO") %>%
separate(Druggability_GO, into=c("Druggability", "GO category"), sep=";") %>%
mutate(GO = recode(`GO category`, "GOMF_KINASE_ACTIVITY"="Kinases", "GOMF_NUCLEAR_RECEPTOR_ACTIVITY"="Nuc receptors", "GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"="GPCRs", "GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"="Ion transporters", "GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"="TFs")) %>%
# filter(!GO=="TFs") %>%
# pull(GO) %>% unique()
# mutate(GO = factor(GO, levels=c("1-10kb", "10-50kb", "50-100kb", ">100kb", "GPCRs", "Nuc receptors", "Kinases", "Ion transporters", "TFs", "BrainEnriched_HumanProteinAtlas"))) %>%
filter(Druggability %in% c("Tier 1", "SplicingPerturbed", "Larger set of risdiplam-effected")) %>%
mutate(Druggability = recode(Druggability, "Tier 1"="1059 targets of approved small molecules", "SplicingPerturbed"="219 Risdiplam-induced NMD targets", "Larger set of risdiplam-effected"="627 post-txn down genes")) %>%
filter(!GO %in% c("OMIM_DomNegative")) %>%
ggplot(aes(x=GO, y=log2(estimate), fill=GO)) +
geom_col() +
geom_errorbar(aes(ymin=log2(conf.low), ymax=log2(conf.high)), width=.2) +
geom_hline(yintercept = 0) +
scale_y_continuous(limits=c(-5,5)) +
Rotate_x_labels +
scale_fill_manual(
values=c(
"1"="#b2e2e2",
"2"="#66c2a4",
"3"="#2ca25f",
"4"="#006d2c",
"GPCRs"="#ffffb3",
"Ion transporters"="#bebada",
"Kinases"="#fb8072",
"Nuc receptors"="#80b1d3",
"TFs"="#fdb462"),
labels=c(
"1"="Q1 gene length; <8.2kb",
"2"="Q2 gene length; 8-23kb",
"3"="Q3 gene length; 23-59kb",
"4"="Q4 gene length; >59kb",
"GPCRs"="GPCRs",
"Ion transporters",
"Kinases",
"Nuc receptors"="Nuclear receptors",
"TFs"="Transcription factors"
)) +
# scale_y_continuous(trans="log2") +
scale_x_discrete(labels=c("Q1 gene length", "Q2 gene length", "Q3 gene length", "Q4 gene length", "GPCRs",
"Ion transporters",
"Kinases",
"Nuc receptors",
"TFs")) +
facet_wrap(~Druggability, labeller = label_wrap_gen(25)) +
labs(y="Enrichment; log2(OddsRatio)", x="Gene category", fill="Gene category")
P.genesetEnrichment
And also let’s print the table of p values and stuff for reference…
results %>%
bind_rows(.id="Druggability_GO") %>%
separate(Druggability_GO, into=c("Druggability", "GO category"), sep=";") %>%
mutate(GO = recode(`GO category`, "GOMF_KINASE_ACTIVITY"="Kinases", "GOMF_NUCLEAR_RECEPTOR_ACTIVITY"="Nuc receptors", "GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"="GPCRs", "GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"="Ion transporters", "GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"="TFs")) %>%
dplyr::select(-method, -`GO category`) %>%
kable()
Druggability | estimate | p.value | conf.low | conf.high | alternative | GO |
---|---|---|---|---|---|---|
Tier 3B | 9.1584316 | 0.0000000 | 7.2645120 | 11.4817690 | two.sided | Ion transporters |
Tier 3B | 17.8318534 | 0.0000000 | 14.0484213 | 22.5384382 | two.sided | GPCRs |
Tier 3B | 7.1777300 | 0.0000393 | 2.9207456 | 15.3880137 | two.sided | Nuc receptors |
Tier 3B | 4.2068880 | 0.0000000 | 3.1530710 | 5.5375236 | two.sided | Kinases |
Tier 3B | 0.2148774 | 0.0000000 | 0.0920764 | 0.4277921 | two.sided | TFs |
Tier 3B | 1.0360823 | 0.7393335 | 0.8273790 | 1.2882130 | two.sided | 2 |
Tier 3B | 0.9381724 | 0.6175399 | 0.7439220 | 1.1736752 | two.sided | 3 |
Tier 3B | 1.1252871 | 0.2674937 | 0.9034546 | 1.3925325 | two.sided | 4 |
Tier 3B | 1.5953272 | 0.0000049 | 1.3049055 | 1.9423057 | two.sided | 1 |
Tier 3B | 1.5695556 | 0.0017286 | 1.1759211 | 2.0639941 | two.sided | OMIM_DomNegative |
Tier 1 | 4.5775591 | 0.0000000 | 3.7197576 | 5.6022199 | two.sided | Ion transporters |
Tier 1 | 10.0233817 | 0.0000000 | 8.1091459 | 12.3478953 | two.sided | GPCRs |
Tier 1 | 19.0191977 | 0.0000000 | 10.8351038 | 33.2253463 | two.sided | Nuc receptors |
Tier 1 | 18.6713523 | 0.0000000 | 15.8462886 | 21.9655357 | two.sided | Kinases |
Tier 1 | 0.4073325 | 0.0000000 | 0.2742487 | 0.5844331 | two.sided | TFs |
Tier 1 | 1.1224319 | 0.1361653 | 0.9627033 | 1.3047237 | two.sided | 2 |
Tier 1 | 1.4346905 | 0.0000011 | 1.2415966 | 1.6542055 | two.sided | 3 |
Tier 1 | 2.0315724 | 0.0000000 | 1.7744165 | 2.3226804 | two.sided | 4 |
Tier 1 | 0.6106577 | 0.0000000 | 0.5070732 | 0.7310714 | two.sided | 1 |
Tier 1 | 4.6770866 | 0.0000000 | 4.0250797 | 5.4242085 | two.sided | OMIM_DomNegative |
Tier 2 | 3.3065440 | 0.0000000 | 2.4867452 | 4.3360520 | two.sided | Ion transporters |
Tier 2 | 3.7628017 | 0.0000000 | 2.6857913 | 5.1671221 | two.sided | GPCRs |
Tier 2 | 6.5159324 | 0.0000294 | 2.7985334 | 13.5045955 | two.sided | Nuc receptors |
Tier 2 | 7.2893233 | 0.0000000 | 5.8415702 | 9.0413296 | two.sided | Kinases |
Tier 2 | 0.5740051 | 0.0040124 | 0.3706802 | 0.8527182 | two.sided | TFs |
Tier 2 | 1.2090959 | 0.0469397 | 0.9980052 | 1.4579726 | two.sided | 2 |
Tier 2 | 1.6053487 | 0.0000002 | 1.3420347 | 1.9141582 | two.sided | 3 |
Tier 2 | 1.4228886 | 0.0001555 | 1.1836140 | 1.7040751 | two.sided | 4 |
Tier 2 | 0.7421310 | 0.0053825 | 0.5937480 | 0.9201179 | two.sided | 1 |
Tier 2 | 2.4238705 | 0.0000000 | 1.9429889 | 3.0018796 | two.sided | OMIM_DomNegative |
SplicingUnperturbedControl | 0.3653269 | 0.0156537 | 0.1176105 | 0.8637518 | two.sided | Ion transporters |
SplicingUnperturbedControl | 0.1160682 | 0.0039433 | 0.0029283 | 0.6534926 | two.sided | GPCRs |
SplicingUnperturbedControl | 0.0000000 | 1.0000000 | 0.0000000 | 3.8681414 | two.sided | Nuc receptors |
SplicingUnperturbedControl | 1.5204465 | 0.0908983 | 0.9126993 | 2.3993212 | two.sided | Kinases |
SplicingUnperturbedControl | 0.7482665 | 0.2601520 | 0.4503874 | 1.1769642 | two.sided | TFs |
SplicingUnperturbedControl | 1.2800803 | 0.0415533 | 1.0025661 | 1.6223144 | two.sided | 2 |
SplicingUnperturbedControl | 1.5278082 | 0.0003469 | 1.2093794 | 1.9186500 | two.sided | 3 |
SplicingUnperturbedControl | 1.5680826 | 0.0001267 | 1.2430408 | 1.9668070 | two.sided | 4 |
SplicingUnperturbedControl | 0.9730301 | 0.8987106 | 0.7466031 | 1.2551631 | two.sided | 1 |
SplicingUnperturbedControl | 1.1430022 | 0.4497635 | 0.7808881 | 1.6262270 | two.sided | OMIM_DomNegative |
SplicingPerturbed | 0.9386204 | 1.0000000 | 0.3715479 | 1.9773767 | two.sided | Ion transporters |
SplicingPerturbed | 0.0000000 | 0.0163396 | 0.0000000 | 0.7741248 | two.sided | GPCRs |
SplicingPerturbed | 0.0000000 | 1.0000000 | 0.0000000 | 6.9654439 | two.sided | Nuc receptors |
SplicingPerturbed | 1.9206458 | 0.0245324 | 1.0268418 | 3.3166145 | two.sided | Kinases |
SplicingPerturbed | 0.5246647 | 0.0763309 | 0.2231702 | 1.0557286 | two.sided | TFs |
SplicingPerturbed | 0.5538893 | 0.0029441 | 0.3552937 | 0.8326365 | two.sided | 2 |
SplicingPerturbed | 1.8721591 | 0.0000414 | 1.3860027 | 2.5082922 | two.sided | 3 |
SplicingPerturbed | 4.5451953 | 0.0000000 | 3.4485599 | 5.9978310 | two.sided | 4 |
SplicingPerturbed | 0.1103190 | 0.0000000 | 0.0399805 | 0.2443146 | two.sided | 1 |
SplicingPerturbed | 2.1468543 | 0.0002109 | 1.4387333 | 3.1190947 | two.sided | OMIM_DomNegative |
Tier 3A | 19.8449432 | 0.0000000 | 11.6405941 | 33.3450155 | two.sided | Ion transporters |
Tier 3A | 5.4098961 | 0.0005572 | 2.0754929 | 11.9337986 | two.sided | GPCRs |
Tier 3A | 19.7590682 | 0.0006250 | 3.8539321 | 63.4995983 | two.sided | Nuc receptors |
Tier 3A | 5.5396300 | 0.0000175 | 2.6064320 | 10.7419787 | two.sided | Kinases |
Tier 3A | 0.6510472 | 0.6269370 | 0.1306695 | 1.9937555 | two.sided | TFs |
Tier 3A | 1.1419718 | 0.6476907 | 0.5965028 | 2.0619601 | two.sided | 2 |
Tier 3A | 0.8630289 | 0.7608965 | 0.4202778 | 1.6327580 | two.sided | 3 |
Tier 3A | 2.8520798 | 0.0000493 | 1.6883620 | 4.7608384 | two.sided | 4 |
Tier 3A | 0.5357802 | 0.1253387 | 0.2209096 | 1.1271720 | two.sided | 1 |
Tier 3A | 3.0707135 | 0.0006864 | 1.5702242 | 5.6275106 | two.sided | OMIM_DomNegative |
Larger set of risdiplam-effected | 0.6431856 | 0.1165465 | 0.3478389 | 1.0943909 | two.sided | Ion transporters |
Larger set of risdiplam-effected | 0.3620125 | 0.0160427 | 0.1165987 | 0.8552979 | two.sided | GPCRs |
Larger set of risdiplam-effected | 0.0000000 | 0.4098360 | 0.0000000 | 2.3765859 | two.sided | Nuc receptors |
Larger set of risdiplam-effected | 1.6256628 | 0.0103286 | 1.1067487 | 2.3176707 | two.sided | Kinases |
Larger set of risdiplam-effected | 0.4746449 | 0.0002571 | 0.2907583 | 0.7349793 | two.sided | TFs |
Larger set of risdiplam-effected | 0.9349390 | 0.5449357 | 0.7576516 | 1.1461573 | two.sided | 2 |
Larger set of risdiplam-effected | 1.8919248 | 0.0000000 | 1.5862322 | 2.2508137 | two.sided | 3 |
Larger set of risdiplam-effected | 2.1640447 | 0.0000000 | 1.8214016 | 2.5655820 | two.sided | 4 |
Larger set of risdiplam-effected | 0.4578311 | 0.0000000 | 0.3484867 | 0.5929993 | two.sided | 1 |
Larger set of risdiplam-effected | 1.0247280 | 0.8809431 | 0.7508390 | 1.3722556 | two.sided | OMIM_DomNegative |
OMIM_DomNegative | 3.0516993 | 0.0000000 | 2.5247921 | 3.6715647 | two.sided | Ion transporters |
OMIM_DomNegative | 1.1765324 | 0.3101460 | 0.8425683 | 1.6078825 | two.sided | GPCRs |
OMIM_DomNegative | 5.3809014 | 0.0000002 | 2.8908877 | 9.6572225 | two.sided | Nuc receptors |
OMIM_DomNegative | 2.3184865 | 0.0000000 | 1.8913974 | 2.8247566 | two.sided | Kinases |
OMIM_DomNegative | 2.0352455 | 0.0000000 | 1.7399303 | 2.3727446 | two.sided | TFs |
OMIM_DomNegative | 0.9246512 | 0.2124116 | 0.8163187 | 1.0452143 | two.sided | 2 |
OMIM_DomNegative | 1.2483949 | 0.0001601 | 1.1121383 | 1.3993429 | two.sided | 3 |
OMIM_DomNegative | 2.0875980 | 0.0000000 | 1.8789638 | 2.3174869 | two.sided | 4 |
OMIM_DomNegative | 0.8581283 | 0.0162214 | 0.7556052 | 0.9723993 | two.sided | 1 |
OMIM_DomNegative | Inf | 0.0000000 | 8402.0512839 | Inf | two.sided | OMIM_DomNegative |
DruggabilityCategories
[1] "Tier 3B" "Tier 1"
[3] "Tier 2" "SplicingUnperturbedControl"
[5] "SplicingPerturbed" "Tier 3A"
[7] "Larger set of risdiplam-effected" "OMIM_DomNegative"
GroupLabels <- c("OMIM_DomNegative"="Disease genes", "Whole genome"="Whole genome", "Tier 1"="1059 targets of approved small molecules", "SplicingPerturbed"="219 Risdiplam-induced NMD targets", "Larger set of risdiplam-effected"="627 post-txn down genes")
P.enrichmentByBarplot <- bind_rows(
DruggabilityCategories.totest.df %>%
filter(Group %in% c("Tier 1", "OMIM_DomNegative", "SplicingPerturbed", "Larger set of risdiplam-effected")),
CodingGenes %>%
mutate(Group = "Whole genome")
) %>%
left_join(
GO.genes.of.interest.totest %>%
filter(str_detect(GO_Name, "GOMF")),
by=c("gene"="ensembl_gene_id")
) %>%
distinct(gene, Group, .keep_all=T) %>%
replace_na(list(GO_Name="Other")) %>%
mutate(GO = recode(`GO_Name`, "GOMF_KINASE_ACTIVITY"="Kinases", "GOMF_NUCLEAR_RECEPTOR_ACTIVITY"="Nuc receptors", "GOMF_G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY"="GPCRs", "GOMF_MONOATOMIC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY"="Ion transporters", "GOMF_DNA_BINDING_TRANSCRIPTION_FACTOR_ACTIVITY"="TFs")) %>%
count(Group, GO) %>%
mutate(Group = recode(Group, !!!GroupLabels)) %>%
mutate(Group = factor(Group, levels=c("1059 targets of approved small molecules", "219 Risdiplam-induced NMD targets", "627 post-txn down genes", "Whole genome", "Disease genes"))) %>%
filter(!Group=="627 post-txn down genes") %>%
ggplot(aes(x=Group, y=n, fill=GO)) +
geom_col(position="fill") +
scale_fill_manual(
values=c(
"GPCRs"="#ffffb3",
"Ion transporters"="#bebada",
"Kinases"="#fb8072",
"Nuc receptors"="#80b1d3",
"TFs"="#fdb462",
"Other"="#737373"),
labels=c(
"GPCRs"="GPCRs",
"Ion transporters",
"Kinases",
"Nuc receptors"="Nuclear receptors",
"TFs"="Transcription factors",
"Other"="Other"
)) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 22), limits=rev) +
scale_y_continuous(expand=c(0,0)) +
labs(x=NULL, y="Fraction", fill="Gene category", caption=str_wrap("Disease genes are 1823 genes with dominant negative phenotypes in OMIM", 30)) +
coord_flip()
P.enrichmentByBarplot
And also for reference, let’s print a list of the intersection of the OMIM genes and the NMD target sets… This isn’t the strongest filtering (for example, to really follow up on a potential target I would definitely manually inspect the data a lot closer), it’s just a quick reference.
OMIM.risdiplam.genes <- P.scatter.byPrediction.dat %>%
filter(FDR_polyA < 0.1) %>%
filter(logFC_polyA < -1) %>%
filter(ColorEffect == "NMD") %>%
dplyr::rename(DeltaPSI=Estimate) %>%
inner_join(
OMIM %>%
filter(!is.na(Phenotypes)) %>%
filter(!is.na(`Ensembl Gene ID`)) %>%
filter(str_detect(Phenotypes, "dominant")) %>%
dplyr::select("MIM Number":"Phenotypes"),
by=c("gene"="Ensembl Gene ID")
)
kable(OMIM.risdiplam.genes)
gene | dose.nM | logFC_chRNA | logFC_polyA | PValue_chRNA | PValue_polyA | FDR_chRNA | FDR_polyA | Category | ColorEffect | MergedEffect | junc | DeltaPSI | MIM Number | Gene Symbols | Gene Name | Approved Gene Symbol | Entrez Gene ID | Comments | Phenotypes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000078403 | 3160 | 0.6755900 | -1.288923 | 0.0000034 | 0.0000000 | 0.0001318 | 0.0000002 | post-txn down | NMD | Frame shifting | chr10:21728675:21730900:+ | 0.7762386 | 602409 | AF10 | ALL1 fused gene from chromosome 10 | MLLT10 | 8028 | fuses with MLL and HEAB | Leukemia, acute myeloid, 601626 (3), Somatic mutation, Autosomal dominant |
ENSG00000033030 | 3160 | 0.6288341 | -1.961917 | 0.0000090 | 0.0000000 | 0.0002713 | 0.0000000 | post-txn down | NMD | Frame shifting | chr12:122478292:122478664:- | 0.7376685 | 616381 | ZCCHC8, PFBMFT5 | Zinc finger CCHC domain-containing protein 8 | ZCCHC8 | 55596 | mutation identified in 1 PFBMFT5 family | ?Pulmonary fibrosis and/or bone marrow failure, telomere-related, 5, 618674 (3), Autosomal dominant |
ENSG00000115415 | 3160 | 0.4927000 | -4.357511 | 0.0005217 | 0.0000000 | 0.0059284 | 0.0000000 | post-txn down | NMD | NMD | chr2:190977025:190978528:- | 0.9243423 | 600555 | STAT1, CANDF7, IMD31A, IMD31B, IMD31C | Signal transducer and activator of transcription-1 | STAT1 | 6772 | NA | Immunodeficiency 31C, chronic mucocutaneous candidiasis, autosomal dominant, 614162 (3), Autosomal dominant; Immunodeficiency 31A, mycobacteriosis, autosomal dominant, 614892 (3), Autosomal dominant; Immunodeficiency 31B, mycobacterial and viral infections, autosomal recessive, 613796 (3), Autosomal recessive |
ENSG00000103335 | 3160 | 0.5567517 | -1.099484 | 0.0005570 | 0.0015190 | 0.0062411 | 0.0087129 | post-txn down | NMD | Frame shifting | chr16:88720532:88720660:- | 1.0025870 | 611184 | PIEZO1, FAM38A, MIB, DHS, LMPHM6 | PIEZO1 ion channel | PIEZO1 | 9780 | NA | Lymphatic malformation 6, 616843 (3), Autosomal recessive; Dehydrated hereditary stomatocytosis with or without pseudohyperkalemia and/or perinatal edema, 194380 (3), Autosomal dominant |
ENSG00000100485 | 3160 | 0.3876746 | -1.104077 | 0.0009561 | 0.0086641 | 0.0094387 | 0.0339973 | post-txn down | NMD | Frame shifting | chr14:50130762:50133808:- | 0.9987109 | 601247 | SOS2, NS9 | SOS Ras/Rac guanine nucleotide exchange factor 2 | SOS2 | 6655 | NA | Noonan syndrome 9, 616559 (3), Autosomal dominant |
ENSG00000096093 | 3160 | 0.0248003 | -1.004033 | 0.8927372 | 0.0278749 | 0.9771466 | 0.0810435 | post-txn down | NMD | Frame shifting | chr6:52428243:52438304:+ | 0.2162461 | 608815 | EFHC1, FLJ10466, EJM1, JAE, EJA1 | EF hand domain (C-terminal)-containing 1 | EFHC1 | 114327 | NA | {Epilepsy, juvenile absence, susceptibility to, 1}, 607631 (3), Autosomal dominant; {Myoclonic epilepsy, juvenile, susceptibility to, 1}, 254770 (3), Autosomal dominant |
ENSG00000197386 | 3160 | NA | -1.836470 | NA | 0.0017826 | NA | 0.0099649 | Other | NMD | Frame shifting | chr4:3213736:3213958:+ | 0.9595212 | 613004 | HTT, HD, IT15, LOMARS | Huntingtin | HTT | 3064 | distal to D4S10 | Lopes-Maciel-Rodan syndrome, 617435 (3), Autosomal recessive; Huntington disease, 143100 (3), Autosomal dominant |
ENSG00000159063 | 3160 | NA | -2.969738 | NA | 0.0000000 | NA | 0.0000000 | Other | NMD | Frame shifting | chr11:78106946:78108970:- | 0.2456783 | 608103 | ALG8, CDG1H, PCLD3 | ALG8 alpha-1,3-glucosyltransferase | ALG8 | 79053 | NA | Congenital disorder of glycosylation, type Ih, 608104 (3), Autosomal recessive; Polycystic liver disease 3 with or without kidney cysts, 617874 (3), Autosomal dominant |
ENSG00000083799 | 3160 | NA | -1.214614 | NA | 0.0000207 | NA | 0.0002209 | Other | NMD | In-frame PTC | chr16:50775855:50776179:+ | 0.8050513 | 605018 | CYLD, CDMT, EAC, MFT1, KIAA0849, BRSS, FTDALS8 | CYLD lysine-63 deubiquitinase | CYLD | 1540 | mutation identified in 1 FTDALS8 family | Brooke-Spiegler syndrome, 605041 (3), Autosomal dominant; Cylindromatosis, familial, 132700 (3), Autosomal dominant; Trichoepithelioma, multiple familial, 1, 601606 (3), Autosomal dominant; ?Frontotemporal dementia and/or amyotrophic lateral sclerosis 8, 619132 (3), Autosomal dominant |
ENSG00000154864 | 3160 | NA | -1.163771 | NA | 0.0057753 | NA | 0.0250624 | Other | NMD | Frame shifting | chr18:10702171:10702546:- | 0.9428523 | 613629 | PIEZO2, FAM38B, DA5, DA3, MWKS, DAIPT | PIEZO-type mechanosensitive ion channel component 2 | PIEZO2 | 63895 | mutation identified in 1 MWKS patient | Arthrogryposis, distal, type 5, 108145 (3), Autosomal dominant; Arthrogryposis, distal, with impaired proprioception and touch, 617146 (3), Autosomal recessive; Arthrogryposis, distal, type 3, 114300 (3), Autosomal dominant; ?Marden-Walker syndrome, 248700 (3), Autosomal dominant |
ENSG00000154864 | 3160 | NA | -1.163771 | NA | 0.0057753 | NA | 0.0250624 | Other | NMD | Frame shifting | chr18:10911228:10963755:- | 0.0729154 | 613629 | PIEZO2, FAM38B, DA5, DA3, MWKS, DAIPT | PIEZO-type mechanosensitive ion channel component 2 | PIEZO2 | 63895 | mutation identified in 1 MWKS patient | Arthrogryposis, distal, type 5, 108145 (3), Autosomal dominant; Arthrogryposis, distal, with impaired proprioception and touch, 617146 (3), Autosomal recessive; Arthrogryposis, distal, type 3, 114300 (3), Autosomal dominant; ?Marden-Walker syndrome, 248700 (3), Autosomal dominant |
ENSG00000154864 | 3160 | NA | -1.163771 | NA | 0.0057753 | NA | 0.0250624 | Other | NMD | Frame shifting | chr18:10702171:10702546:- | 0.9428523 | 613629 | PIEZO2, FAM38B, DA5, DA3, MWKS, DAIPT | PIEZO-type mechanosensitive ion channel component 2 | PIEZO2 | 63895 | mutation identified in 1 MWKS patient | Arthrogryposis, distal, type 5, 108145 (3), Autosomal dominant; Arthrogryposis, distal, with impaired proprioception and touch, 617146 (3), Autosomal recessive; Arthrogryposis, distal, type 3, 114300 (3), Autosomal dominant; ?Marden-Walker syndrome, 248700 (3), Autosomal dominant |
ENSG00000154864 | 3160 | NA | -1.163771 | NA | 0.0057753 | NA | 0.0250624 | Other | NMD | Frame shifting | chr18:10911228:10963755:- | 0.0729154 | 613629 | PIEZO2, FAM38B, DA5, DA3, MWKS, DAIPT | PIEZO-type mechanosensitive ion channel component 2 | PIEZO2 | 63895 | mutation identified in 1 MWKS patient | Arthrogryposis, distal, type 5, 108145 (3), Autosomal dominant; Arthrogryposis, distal, with impaired proprioception and touch, 617146 (3), Autosomal recessive; Arthrogryposis, distal, type 3, 114300 (3), Autosomal dominant; ?Marden-Walker syndrome, 248700 (3), Autosomal dominant |
Now let’s plot the ecdf of gene length nicer for possible publication…
GeneLength.dat <- GeneCategories %>%
left_join(NumExonsAndLength) %>%
filter(!Group %in% c("Tier 2", "Tier 3A", "Tier 3B")) %>%
mutate(GeneLength = GeneLength/1E3)
GeneLength.dat %>% count(Group)
# A tibble: 4 × 2
Group n
<chr> <int>
1 SplicingPerturbed 219
2 SplicingUnperturbedControl 396
3 Tier 1 1059
4 Whole protein coding genome 18484
GeneLength.dat %>%
filter(Group %in% c("SplicingPerturbed", "SplicingUnperturbedControl")) %>%
wilcox.test(GeneLength~Group,data = .)
Wilcoxon rank sum test with continuity correction
data: GeneLength by Group
W = 60572, p-value = 3.458e-16
alternative hypothesis: true location shift is not equal to 0
P.geneLength <- ggplot(GeneLength.dat, aes(x=GeneLength, color=Group)) +
stat_ecdf() +
scale_x_continuous(trans='log10') +
annotation_logticks() +
coord_cartesian(xlim=c(1, 500)) +
scale_color_manual(
labels=c(
"SplicingPerturbed"="219 Risdiplam-induced NMD targets",
"SplicingUnperturbedControl"="389 expression-matched genes",
"Whole protein coding genome"="Whole genome",
"Tier 1"="1059 targets of approved small molecules"),
values=c(
"SplicingPerturbed"="#e31a1c",
"SplicingUnperturbedControl"="#1f78b4",
"Whole protein coding genome"="#000000",
"Tier 1"="#737373")
) +
labs(y="ecdf", color="Druggability gene group", x="Gene length (kb)", caption="Splicing perturbed to control P: 3.4x10^-15")
P.geneLength
Even though the point is that longer genes are more likely to have a splicing change due to chance, I kind of want to repeat this analysis in a way that isn’t dependent on splicing and has no testing difference in long vs short genes… That is, by looking at post-txn down genes, they should be longer than expression matched controls…
SplicingDruggableGenes_ByPostTxn <- Scatter.dat %>%
filter(Category == "post-txn down") %>%
filter(gene %in% CodingGenes$gene) %>%
dplyr::select(gene) %>%
distinct(gene, .keep_all=T)
Expression.tidyForMatchingGenes <- Expression.table.genes %>%
mutate(IsSplicingDruggable = gene %in% SplicingDruggableGenes_ByPostTxn$gene) %>%
arrange(medianTPM) %>%
mutate(LaggingGeneGroup = lag(IsSplicingDruggable)) %>%
mutate(LeadingGeneGroup = lead(IsSplicingDruggable)) %>%
ungroup()
Merged.WithExpressionMatchedControlGenes <-
bind_rows(
Expression.tidyForMatchingGenes %>%
filter(IsSplicingDruggable) %>%
mutate(Group = "SplicingPerturbed"),
Expression.tidyForMatchingGenes %>%
filter(!IsSplicingDruggable & LaggingGeneGroup) %>%
mutate(Group = "SplicingUnperturbedControl"),
Expression.tidyForMatchingGenes %>%
filter(!IsSplicingDruggable & LeadingGeneGroup) %>%
mutate(Group = "SplicingUnperturbedControl"),
) %>%
dplyr::select(gene, medianTPM, Group)
Merged.WithExpressionMatchedControlGenes %>%
ggplot(aes(medianTPM, color=Group)) +
stat_ecdf() +
scale_x_continuous(trans='log10')
GeneLength.dat <- GeneCategories %>%
filter(Group %in% c("Tier 1", "Whole protein coding genome")) %>%
bind_rows(Merged.WithExpressionMatchedControlGenes) %>%
left_join(NumExonsAndLength) %>%
mutate(GeneLength = GeneLength/1E3)
GeneLength.dat %>%
count(Group)
# A tibble: 4 × 2
Group n
<chr> <int>
1 SplicingPerturbed 604
2 SplicingUnperturbedControl 1142
3 Tier 1 1059
4 Whole protein coding genome 18484
GeneLength.dat %>%
filter(Group %in% c("SplicingPerturbed", "SplicingUnperturbedControl")) %>%
wilcox.test(GeneLength~Group,data = .)
Wilcoxon rank sum test with continuity correction
data: GeneLength by Group
W = 427220, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
P.geneLength.FromPostTxnSet <- ggplot(GeneLength.dat, aes(x=GeneLength, color=Group)) +
stat_ecdf() +
scale_x_continuous(trans='log10') +
annotation_logticks() +
coord_cartesian(xlim=c(1, 500)) +
scale_color_manual(
labels=c(
"SplicingPerturbed"="604 Risdiplam-induced post-txn down genes",
"SplicingUnperturbedControl"="1142 expression-matched genes",
"Whole protein coding genome"="Whole genome",
"Tier 1"="1059 targets of approved small molecules"),
values=c(
"SplicingPerturbed"="#e31a1c",
"SplicingUnperturbedControl"="#1f78b4",
"Whole protein coding genome"="#000000",
"Tier 1"="#737373")
) +
labs(y="ecdf", color="Druggability gene group", x="Gene length (kb)", caption="Splicing perturbed to control P<2.2x10^-16")
P.geneLength.FromPostTxnSet
I think the last thing I want to show are some examples of dose response curves…
Let’s start by doing all the genes that intersect the OMIM short list.
OMIM.risdiplam.genes
# A tibble: 13 × 20
gene dose.nM logFC_chRNA logFC_polyA PValue_chRNA PValue_polyA FDR_chRNA
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ENSG0000… 3160 0.676 -1.29 0.00000339 9.04e- 9 0.000132
2 ENSG0000… 3160 0.629 -1.96 0.00000902 9.49e-12 0.000271
3 ENSG0000… 3160 0.493 -4.36 0.000522 1.42e-36 0.00593
4 ENSG0000… 3160 0.557 -1.10 0.000557 1.52e- 3 0.00624
5 ENSG0000… 3160 0.388 -1.10 0.000956 8.66e- 3 0.00944
6 ENSG0000… 3160 0.0248 -1.00 0.893 2.79e- 2 0.977
7 ENSG0000… 3160 NA -1.84 NA 1.78e- 3 NA
8 ENSG0000… 3160 NA -2.97 NA 1.38e-18 NA
9 ENSG0000… 3160 NA -1.21 NA 2.07e- 5 NA
10 ENSG0000… 3160 NA -1.16 NA 5.78e- 3 NA
11 ENSG0000… 3160 NA -1.16 NA 5.78e- 3 NA
12 ENSG0000… 3160 NA -1.16 NA 5.78e- 3 NA
13 ENSG0000… 3160 NA -1.16 NA 5.78e- 3 NA
# … with 13 more variables: FDR_polyA <dbl>, Category <chr>, ColorEffect <chr>,
# MergedEffect <chr>, junc <chr>, DeltaPSI <dbl>, `MIM Number` <dbl>,
# `Gene Symbols` <chr>, `Gene Name` <chr>, `Approved Gene Symbol` <chr>,
# `Entrez Gene ID` <dbl>, Comments <chr>, Phenotypes <chr>
Expression.tidyForDose_ActualDataForModel <- read_tsv("../code/SmallMolecule/FitModels/Data/polyA_genes.tsv.gz")
#Calculate factors to visually normalize chRNA and polyA to the same value at dose.nM==0
DiffFactors <- Expression.tidyForDose_ActualDataForModel %>%
filter(dose.nM == 0) %>%
group_by(Geneid, LibraryType) %>%
summarise(MeanLog2CPM=mean(Log2CPM)) %>%
ungroup() %>%
pivot_wider(names_from="LibraryType", values_from="MeanLog2CPM") %>%
mutate(DiffFactor = polyA - chRNA)
Splicing.tidyForDose_ActualDataForModel <- read_tsv("../code/SmallMolecule/CassetteExons/TidyPSI.tsv.gz") %>%
inner_join(
P.boxplot.dat %>%
filter(dataset=="polyA") %>%
dplyr::select(GAGTjunc=GAGTInt, gene, ColorEffect, MergedEffect, ExonStart_Cassette, ExonStop_Cassette) %>%
unite(Exon, ExonStart_Cassette,ExonStop_Cassette, sep="-")
)
P.OMIM.splicing.Effections <- Splicing.tidyForDose_ActualDataForModel %>%
filter(gene %in% OMIM.risdiplam.genes$gene) %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id")) %>%
mutate(chrom = str_extract(GAGTjunc, "^chr.+?:")) %>%
mutate(facet = str_glue("{hgnc_symbol}\n{chrom}{Exon}")) %>%
# filter(libType=="polyA") %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=PSI, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0) + 0.1, labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) + facet_wrap(~facet, scales="free_y") +
Rotate_x_labels +
labs(x="[risdiplam] (nanomolar)", y="Splicing of indicated NMD exon; PSI")
P.OMIM.splicing.Effections
P.OMIM.expression.Effections <- Expression.tidyForDose_ActualDataForModel %>%
dplyr::rename(libType = "LibraryType") %>%
filter(ensembl_gene_id %in% OMIM.risdiplam.genes$gene) %>%
left_join(ensembl_to_symbols) %>%
left_join(DiffFactors) %>%
mutate(Log2CPM = case_when(
libType == "polyA" ~ Log2CPM,
libType == "chRNA" ~ Log2CPM + DiffFactor
)) %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=Log2CPM, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
facet_wrap(~hgnc_symbol, scales="free_y") +
Rotate_x_labels +
# scale_x_break(c(1,2)) +
labs(x="[risdiplam] (nanomolar)", y=expression("Expression; "*log["2"]*"CPM"))
P.OMIM.expression.Effections
Expression.tidyForDose_ActualDataForModel %>%
dplyr::rename(libType = "LibraryType") %>%
# filter(ensembl_gene_id %in% OMIM.risdiplam.genes$gene) %>%
left_join(ensembl_to_symbols) %>%
filter(hgnc_symbol %in% c("HTT", "MYB", "STAT1")) %>%
left_join(DiffFactors) %>%
mutate(Log2CPM = case_when(
libType == "polyA" ~ Log2CPM,
libType == "chRNA" ~ Log2CPM + DiffFactor
)) %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=Log2CPM, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
facet_wrap(~hgnc_symbol, scales="free_y") +
Rotate_x_labels +
# scale_x_break(c(1,2)) +
labs(x="[risdiplam] (nanomolar)", y=expression("Expression; "*log["2"]*"CPM"))
One more version with just log2FC on y axis so we don’t have to artifiially/misleadingly set different CPM scale for chRNA.
Expression.tidyForDose_ActualDataForModel.Relative <- Expression.tidyForDose_ActualDataForModel %>%
left_join(
Expression.tidyForDose_ActualDataForModel %>%
filter(dose.nM == 0) %>%
group_by(LibraryType, Geneid) %>%
summarise(MeanNormalCPM = mean(Log2CPM, na.rm=T))
) %>%
mutate()
P.OMIM.expression.Effections <- Expression.tidyForDose_ActualDataForModel.Relative %>%
dplyr::rename(libType = "LibraryType") %>%
filter(ensembl_gene_id %in% OMIM.risdiplam.genes$gene) %>%
left_join(ensembl_to_symbols) %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=Log2CPM-MeanNormalCPM, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) + facet_wrap(~hgnc_symbol, scales="free_y") +
Rotate_x_labels +
labs(x="[risdiplam] (nanomolar)", y=expression("Expression "*log["2"]*"FC"))
P.OMIM.expression.Effections
Get just the stat1 data and save
P.stat1.expression <- Expression.tidyForDose_ActualDataForModel.Relative %>%
dplyr::rename(libType = "LibraryType") %>%
left_join(ensembl_to_symbols) %>%
filter(hgnc_symbol == "STAT1") %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=Log2CPM-MeanNormalCPM, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
# Rotate_x_labels +
scale_x_break(c(2,2)) +
theme(axis.text.x.top = element_blank(),
axis.ticks.x.top = element_blank(),
axis.line.x.top = element_blank(),
plot.background = element_blank()) +
Rotate_x_labels +
labs(x="[risdiplam] (nanomolar)", y=expression("STAT1 expression; "*log["2"]*"FC"))
P.stat1.expression
P.stat1.splicing <- Splicing.tidyForDose_ActualDataForModel %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id")) %>%
filter(hgnc_symbol == "STAT1") %>%
mutate(chrom = str_extract(GAGTjunc, "^chr.+?:")) %>%
mutate(facet = str_glue("{hgnc_symbol}\n{chrom}{Exon}")) %>%
# filter(libType=="polyA") %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=PSI, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
labs(x="[risdiplam] (nanomolar)", y="Splicing of\nNMD-targeting exon; PSI") +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
# Rotate_x_labels +
scale_x_break(c(2,2)) +
theme(axis.text.x.top = element_blank(),
axis.ticks.x.top = element_blank(),
axis.line.x.top = element_blank(),
plot.background = element_blank()) +
Rotate_x_labels
P.stat1.splicing
Get just myb data and save
P.myb.expression <- Expression.tidyForDose_ActualDataForModel.Relative %>%
dplyr::rename(libType = "LibraryType") %>%
left_join(ensembl_to_symbols) %>%
filter(hgnc_symbol == "MYB") %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=Log2CPM-MeanNormalCPM, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525"), labels=c("chRNA"="naRNA", "polyA"="steady-state")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
# Rotate_x_labels +
# scale_x_break(c(2,2)) +
theme(axis.text.x.top = element_blank(),
axis.ticks.x.top = element_blank(),
axis.line.x.top = element_blank(),
plot.background = element_blank()) +
Rotate_x_labels +
labs(x="[risdiplam] (nanomolar)", y=expression("MYB expression; "*log["2"]*"FC"))
P.myb.expression
P.myb.splicing <- Splicing.tidyForDose_ActualDataForModel %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id")) %>%
filter(gene == "ENSG00000142599") %>%
# filter(hgnc_symbol == "MYB") %>%
mutate(chrom = str_extract(GAGTjunc, "^chr.+?:")) %>%
mutate(facet = str_glue("{hgnc_symbol}\n{chrom}{Exon}")) %>%
# filter(libType=="polyA") %>%
# mutate(dose.nM = dose.nM + 0.1) %>%
ggplot(aes(x=dose.nM, y=PSI, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525"), labels=c("chRNA"="naRNA", "polyA"="steady-state")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
labs(x="[risdiplam] (nanomolar)", y="Splicing of\nNMD-targeting exon; PSI") +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
# Rotate_x_labels +
# scale_x_break(c(2,2)) +
theme(axis.text.x.top = element_blank(),
axis.ticks.x.top = element_blank(),
axis.line.x.top = element_blank(),
plot.background = element_blank()) +
Rotate_x_labels
P.myb.splicing
P.myb.splicing <- ggplotify::as.ggplot(print(P.myb.splicing))
P.myb.expression <- ggplotify::as.ggplot(print(P.myb.expression))
Check SRSF10 for Jon
SpearmanCorr %>%
distinct(Intron, .keep_all=T) %>%
filter(substr(DonorSeq, 3,4)=="GA") %>%
filter(q<0.1)
# A tibble: 641 × 15
Intron Leafcutter.ID DonorSeq DonorScore treatment dose.nM Cell.type
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
1 chr1:2189781:2… chr1:2189781… CAGAGTA… 4.53 DMSO 0 LCL
2 chr1:8365974:8… chr1:8365974… AAGAGTA… 5.52 DMSO 0 LCL
3 chr1:18921121:… chr1:1892112… AGGAGTA… 6.27 DMSO 0 LCL
4 chr1:23863271:… chr1:2386327… AGGAGTA… 2.87 DMSO 0 LCL
5 chr1:23875662:… chr1:2387566… AAGAGTA… 4.83 DMSO 0 LCL
6 chr1:28151059:… chr1:2815105… AGGAGTA… 5.30 DMSO 0 LCL
7 chr1:28172527:… chr1:2817252… AGGAGTA… 5.30 DMSO 0 LCL
8 chr1:37493814:… chr1:3749381… GAGAGTA… 5.59 DMSO 0 LCL
9 chr1:41028726:… chr1:4102872… AAGAGTA… 6.42 DMSO 0 LCL
10 chr1:44313228:… chr1:4431322… TAGAGTA… 4.50 DMSO 0 LCL
# … with 631 more rows, and 8 more variables: LibraryType <chr>, rep <dbl>,
# PSI <dbl>, Counts <dbl>, n <dbl>, spearman <dbl>, spearman.p <dbl>, q <dbl>
SpearmanCorr %>%
distinct(Intron, .keep_all=T) %>%
filter(substr(DonorSeq, 3,4)=="GA") %>%
separate(Intron, into=c("chrom", "start", "stop", "strand"), sep=":") %>%
dplyr::select(chrom, start, stop, strand, Leafcutter.ID, spearman, strand) %>%
arrange(chrom, start, stop) %>%
write_tsv("../code/scratch/AllGAGTInts.bed", col_names = F)
Expression.tidyForDose_ActualDataForModel %>%
filter(ensembl_gene_id == "ENSG00000188529") %>%
dplyr::rename(libType=LibraryType) %>%
ggplot(aes(x=dose.nM, y=Log2CPM, color=libType)) +
geom_point() +
scale_color_manual(values=c("chRNA"="#969696", "polyA"="#252525"), labels=c("chRNA"="naRNA", "polyA"="steady-state")) +
geom_smooth(
data = . %>%
filter(libType=="polyA"),
method = drm, method.args = list(fct = L.4(names=c("Steepness", "LowerLimit", "UpperLimit", "ED50"))), se = FALSE) +
labs(x="[risdiplam] (nanomolar)", y="Splicing of\nNMD-targeting exon; PSI") +
scale_x_continuous(trans="log1p", limits=c(0, 10000), breaks=c(10000, 3160, 1000, 316, 100, 31.6, 10, 3.16, 0), labels=rev(c("0", "3.16","10","31.6","100", "316", "1000", "3160", "10000"))) +
# Rotate_x_labels +
# scale_x_break(c(2,2)) +
theme(axis.text.x.top = element_blank(),
axis.ticks.x.top = element_blank(),
axis.line.x.top = element_blank(),
plot.background = element_blank()) +
Rotate_x_labels
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_Bar.pdf", P.bar, height=4, width=9)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_BarSimpler.pdf", P.bar.simpler, height=3, width=8)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_BarSimpler.Fixed.pdf", P.bar.simpler.fixed, height=1.5, width=4)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_BoxLarge.pdf", P.box.small, height=3, width=4)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_BoxSmall.pdf", P.box, height=3, width=4)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_chRNA_polyA_DE_scatter.pdf",P.scatter, height=6, width=8)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_MeanDeltaPSI.pdf",P.MeanDeltaPSI, height=3, width=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_DosePSI_SpearmanDist.pdf",P.spearman.dist, height=6, width=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_PSI_v_DE_ByPrediction.pdf",P.scatter.byPrediction, height=5, width=8)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_PSI_v_DE_ByPrediction_chRNA.pdf",P.scatter.byPrediction.chRNA, height=5, width=8)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_GeneSetEnrichment.pdf",P.genesetEnrichment, height=5, width=10)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_GeneSetEnrichment_QualitativeBarplot.pdf",P.enrichmentByBarplot, height=4, width=6)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_GeneLength.pdf",P.geneLength, height=3, width=7)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_GeneLength_NumExons.pdf",P.NumExons, height=3, width=7)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_GeneLength_LengthHostIntrons.pdf",P.IntLen, height=3, width=5)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_GeneLength_FromPostTxnSet.pdf",P.geneLength.FromPostTxnSet, height=3, width=7)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_stat1_SplicingDoseResponse.pdf",P.stat1.splicing, height=3.5, width=4.5)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_stat1_ExpressionDoseResponse.pdf",P.stat1.expression, height=3.5, width=4.5)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_myb_SplicingDoseResponse.pdf",P.myb.splicing, height=3.5, width=4.5)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_myb_ExpressionDoseResponse.pdf",P.myb.expression, height=3.5, width=4.5)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_OMIM_SplicingDoseResponse.pdf",P.OMIM.splicing.Effections, height=9, width=15)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_OMIM_ExpresionDoseResponse.pdf",P.OMIM.expression.Effections, height=9, width=15)
ggsave("/project2/yangili1/carlos_and_ben_shared/rough_figs/OriginalSubplots/2023_FinalFig4_MA.pdf",P.MA, height=3.5, width=6)
I kind of think it will be simpler presentation (and possibly cleaner results) if i just use my own interpretations of cassette exons instead of using Gencode. Let’s try…
P.bar.dat <- Translated.GAGT_CassetteExons %>%
mutate(IsAnnotated = if_else(str_detect(SuperAnnotation,"Annotated"), "Annotated by Gencode", "Unannotated", missing="Unannotated")) %>%
mutate(MergedEffect = case_when(
IsAnnotated == "Annotated by Gencode" & SemiSupergroupAnnotations=="basic tag" ~ "Productive",
IsAnnotated == "Annotated by Gencode" & !SemiSupergroupAnnotations=="basic tag" ~ "NMD",
TRUE ~ Effect
)) %>%
#HERE IS WHERE I RECODE TO DISREGARD GENCODE ANNOTATIONS
mutate(OldMergedEffect = MergedEffect) %>%
mutate(MergedEffect = Effect) %>%
mutate(ColorEffect = if_else(MergedEffect %in% c("Productive", "Preserves frame", "Cassette in UTR"), "Productive", "NMD") )
P.bar <- P.bar.dat %>%
count(MergedEffect, ColorEffect) %>%
mutate(MergedEffect=factor(MergedEffect, levels=c("Cassette in UTR","Preserves frame", "In-frame PTC", "Frame shifting"))) %>%
mutate(y="y") %>%
# mutate(IsAnnotated = recode(IsAnnotated, "Annotated by Gencode"="Annotated")) %>%
ggplot(aes(y=y, x=n, fill=MergedEffect, color=ColorEffect)) +
geom_col(position="stack", size=3) +
geom_text( aes(label=n), color='black', position=position_stack(vjust=0.5)) +
scale_color_manual(values=c(
"Productive" = "#1f78b4",
"NMD" = "#e31a1c"
)) +
scale_fill_manual(values=c(
# "Productive"="#1f78b4",
# "NMD"="#e31a1c",
"Frame shifting"="#fb6a4a",
"Cassette in UTR"="#deebf7",
"In-frame PTC"="#fcbba1",
"Preserves frame"="#a6cee3"
),
labels=c(
# "Productive"="In annotated productive transcript",
# "NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame"
)) +
scale_x_continuous(expand=c(0,0)) +
Rotate_x_labels +
guides(fill = guide_legend(order = 2),
color = guide_legend(order = 1)) +
theme(legend.box="vertical", legend.position = "bottom") +
theme(
strip.background = element_blank(),
strip.text.x = element_blank()
) +
labs(title="Predicted transcript effects of induced cassete exons", x="Number risdiplam-induced cassette exons", fill="Reason", y=NULL, color="Transcript effect")
P.bar
P.boxplot.dat <- P.bar.dat %>%
dplyr::select(-symbol) %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1")) %>%
left_join(
volcano.dat %>%
filter(dose.nM == 3160) %>%
distinct(gene, dose.nM, dataset, .keep_all=T) %>%
dplyr::select(gene, dose.nM, dataset, logFC) %>%
pivot_wider(names_from="dataset", values_from="logFC"),
by="gene"
) %>%
left_join(ensembl_to_symbols, by=c("gene"="ensembl_gene_id")) %>%
# filter(hgnc_symbol %in% c("HTT", "SMN2", "GALC", "STAT1", "KCNT2", "FHOD3", "FOXM1))
dplyr::select(GAGTInt, ExonStart_Cassette, ExonStop_Cassette, gene, hgnc_symbol, chRNA, polyA, ColorEffect, MergedEffect, IsAnnotated, OldMergedEffect) %>%
gather("dataset", "log2FC", chRNA, polyA)
P.box.small <- P.boxplot.dat %>%
mutate(ColorEffect = factor(ColorEffect, levels=c("Productive", "NMD"))) %>%
ggplot(aes(y=log2FC, x=ColorEffect, color=ColorEffect)) +
geom_hline(yintercept=0) +
geom_boxplot(outlier.shape=NA, alpha=1, fill="white") +
geom_segment(x="Productive", xend="NMD", y=2.5, yend=2.5, color='black') +
geom_point(position=position_jitterdodge(jitter.width=0.5), alpha=0.3) +
geom_text(data = . %>%
group_by(dataset) %>%
do(w = wilcox.test(log2FC~ColorEffect, data=., paired=FALSE)) %>%
summarise(dataset, Wilcox = format.pval(w$p.value, digits=3)) %>%
ungroup() %>%
mutate(label=str_glue("P={Wilcox}")),
aes(label=label, group=NULL),
x=1.5, y=2.5, color='black', vjust=-1) +
scale_color_manual(values=c( "#1f78b4","#e31a1c")) +
facet_wrap(~dataset) +
labs(y=expression("Host gene "*log["2"]*"FC"), x="Predicted transcript effect") +
Rotate_x_labels +
theme(legend.position="none")
P.box.small
P.scatter.byPrediction.dat <- Scatter.dat %>%
filter(dose.nM==3160) %>%
inner_join(
P.boxplot.dat %>%
filter(dataset=="polyA") %>%
dplyr::select(gene, ColorEffect, MergedEffect, OldMergedEffect),
by="gene"
) %>%
inner_join(
PolyA.SplicingEffects.Annotationed %>%
filter(param=="Pred_3160") %>%
dplyr::select(junc, gene, Estimate) %>%
mutate(gene = str_replace(gene, "(^.+?)\\..+?$", "\\1")),
by="gene"
)
MergedEffectLabels <- c(
"Productive"="In annotated productive transcript",
"NMD"="In annotated NMD transcript",
"Frame shifting"="Frame shifting",
"Cassette in UTR"="Cassette in UTR",
"In-frame PTC"="In-frame PTC",
"Preserves frame"="Preserves frame",
"All NMD"="All predicted NMD exons",
"All productive"="All predicted productive exons"
)
P.scatter.byPrediction <- bind_rows(
P.scatter.byPrediction.dat %>%
mutate(color=OldMergedEffect),
P.scatter.byPrediction.dat %>%
mutate(color=OldMergedEffect) %>%
mutate(MergedEffect=recode(ColorEffect, !!!c("NMD"="All NMD", "Productive"="All productive")))
) %>%
mutate(MergedEffect = factor(MergedEffect, levels=c("Preserves frame", "Cassette in UTR", "All productive", "Frame shifting", "In-frame PTC", "All NMD"))) %>%
distinct(gene,MergedEffect, .keep_all=T) %>%
ggplot(aes(x=Estimate, y=logFC_polyA, color=color)) +
geom_hline(yintercept=0) +
geom_point() +
scale_color_manual(values=c(
"Productive"="#1f78b4",
"NMD"="#e31a1c",
"Frame shifting"="#fb9a99",
"Cassette in UTR"="#a6cee3",
"In-frame PTC"="#fb9a99",
"Preserves frame"="#a6cee3"
),
labels=MergedEffectLabels) +
geom_text( data = . %>%
group_by(MergedEffect) %>%
summarise(cor=cor.test(Estimate,logFC_polyA)[["estimate"]], pval=cor.test(Estimate,logFC_polyA, method='s')[["p.value"]]) %>%
mutate(R = signif(cor, 3), P=format.pval(pval, 3)) %>%
mutate(label = str_glue("rho:{R}\nP:{P}")),
aes(x=-Inf, y=-Inf, label=label),
hjust=-.1, vjust=-0.1, color='black'
) +
# facet_wrap(~MergedEffect, labeller = as_labeller(MergedEffectLabels)) +
facet_wrap(~MergedEffect, nrow=2, labeller = as_labeller(MergedEffectLabels, default=label_wrap_gen(20))) +
theme(legend.position="none") +
labs(y=expression("host gene "*log["2"]*"FC, polyA RNA"), x=expression(Delta*"PSI"), caption="[risdiplam]=3160nM, Effects measured in polyA RNA", title="Colored by annotations when available, faceted by my predictions") +
Rotate_x_labels
P.scatter.byPrediction
#First see how my annotations match up with Gencode and Yang's annotations
Translated.GAGT_CassetteExons %>%
mutate(IsAnnotated = if_else(str_detect(SuperAnnotation,"Annotated"), "Annotated by Gencode", "Annotated by Yang")) %>%
replace_na(list(SemiSupergroupAnnotations="Unannotated by either", IsAnnotated="Unannotated by either")) %>%
count(Effect, IsAnnotated, SemiSupergroupAnnotations) %>%
ggplot(aes(x=Effect, y=n, fill=SemiSupergroupAnnotations)) +
geom_col(position="stack") +
facet_wrap(~IsAnnotated, scales="free_y", nrow = 3) +
Rotate_x_labels +
labs(title="Comparing my annotations to Gencode/Yang", x="My annotation by comparing translations", fill="Gencode/Yang's annotation")
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] broom_0.8.0 ggnewscale_0.4.8 biomaRt_2.52.0 ggbreak_0.1.1
[5] drc_3.0-1 MASS_7.3-56 knitr_1.39 ggrepel_0.9.1
[9] qvalue_2.28.0 readxl_1.4.0 edgeR_3.38.4 limma_3.52.4
[13] data.table_1.14.2 RColorBrewer_1.1-3 forcats_0.5.1 stringr_1.4.0
[17] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[21] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] backports_1.4.1 workflowr_1.7.0 BiocFileCache_2.4.0
[4] plyr_1.8.7 splines_4.2.0 GenomeInfoDb_1.32.2
[7] TH.data_1.1-1 digest_0.6.29 yulab.utils_0.0.6
[10] htmltools_0.5.2 fansi_1.0.3 magrittr_2.0.3
[13] memoise_2.0.1 tzdb_0.3.0 Biostrings_2.64.0
[16] modelr_0.1.8 R.utils_2.11.0 vroom_1.5.7
[19] sandwich_3.0-1 prettyunits_1.1.1 colorspace_2.0-3
[22] blob_1.2.3 rvest_1.0.2 rappdirs_0.3.3
[25] haven_2.5.0 xfun_0.30 crayon_1.5.1
[28] RCurl_1.98-1.7 jsonlite_1.8.0 survival_3.3-1
[31] zoo_1.8-10 glue_1.6.2 gtable_0.3.0
[34] zlibbioc_1.42.0 XVector_0.36.0 car_3.1-1
[37] BiocGenerics_0.42.0 abind_1.4-5 scales_1.2.0
[40] mvtnorm_1.1-3 DBI_1.1.2 Rcpp_1.0.8.3
[43] plotrix_3.8-2 progress_1.2.2 gridGraphics_0.5-1
[46] bit_4.0.4 stats4_4.2.0 httr_1.4.3
[49] ellipsis_0.3.2 R.methodsS3_1.8.1 pkgconfig_2.0.3
[52] XML_3.99-0.14 farver_2.1.0 sass_0.4.1
[55] dbplyr_2.1.1 locfit_1.5-9.7 utf8_1.2.2
[58] ggplotify_0.1.0 tidyselect_1.1.2 labeling_0.4.2
[61] rlang_1.0.2 reshape2_1.4.4 later_1.3.0
[64] AnnotationDbi_1.58.0 munsell_0.5.0 cellranger_1.1.0
[67] tools_4.2.0 cachem_1.0.6 cli_3.3.0
[70] generics_0.1.2 RSQLite_2.2.14 evaluate_0.15
[73] fastmap_1.1.0 yaml_2.3.5 bit64_4.0.5
[76] fs_1.5.2 KEGGREST_1.36.2 nlme_3.1-157
[79] whisker_0.4 R.oo_1.24.0 aplot_0.1.10
[82] xml2_1.3.3 compiler_4.2.0 rstudioapi_0.13
[85] filelock_1.0.2 curl_4.3.2 png_0.1-7
[88] reprex_2.0.1 bslib_0.3.1 stringi_1.7.6
[91] highr_0.9 lattice_0.20-45 Matrix_1.5-3
[94] vctrs_0.4.1 pillar_1.7.0 lifecycle_1.0.1
[97] jquerylib_0.1.4 bitops_1.0-7 httpuv_1.6.5
[100] patchwork_1.1.1 R6_2.5.1 promises_1.2.0.1
[103] IRanges_2.30.0 codetools_0.2-18 gtools_3.9.2
[106] assertthat_0.2.1 rprojroot_2.0.3 withr_2.5.0
[109] multcomp_1.4-19 S4Vectors_0.34.0 GenomeInfoDbData_1.2.8
[112] mgcv_1.8-40 parallel_4.2.0 hms_1.1.1
[115] grid_4.2.0 ggfun_0.0.9 rmarkdown_2.14
[118] carData_3.0-5 git2r_0.30.1 Biobase_2.56.0
[121] lubridate_1.8.0