Entering edit mode
4.5 years ago
aroso491
•
0
Hello!
I have a little script that allows me to create overlapped Manhattan plots with ggplot2 between 2 different dataframes (SNPs and CpGs). I have interesting columns in 22 chromosomes and intend to get 22 plots. However, I was thinking that repeating the same code over and over 22 times was a bit of an overkill, but I do not know if it is possible to do it a bit more compact with R.
This is the code fragment I will have to repeat over and over if I want this to work:
#PLOT OF CHROMOSOME 1
chr1_SCPG_computed <- ch1_SCPG %>%
# Compute chromosome size
group_by(CHR) %>%
summarise(chr_len=max(POSITION))%>%
# Calculate cumulative position of each chromosome
mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
dplyr::select(-chr_len) %>%
# Add this info to the initial dataset
left_join(ch1_SCPG, ., by=c("CHR"="CHR")) %>%
# Add a cumulative position of each SNP
arrange(CHR, POSITION) %>%
mutate( BPcum=POSITION+tot)
axisC1_2 = chr1_SCPG_computed %>% group_by(CHR) %>% summarize(center=(max(BPcum) + min(BPcum) ) / 2 )
axisC1_2$CHR <- as.numeric(axisC1_2$CHR)
ggplot()+
geom_point(data=chr1_SSNP_computed, aes(x=BPcum, y=-log10(P)), colour = "red", size=1.3, alpha=0.5, shape=1)+
geom_point(data=chr1_SCPG_computed, aes(x=BPcum, y=-log10(P)), colour = "dark green", size=1.3, alpha=0.5, shape=1)+
geom_hline(yintercept = -log10(sig), color ="red", linetype = "dashed")+
scale_x_continuous(label= axisC1$CHR, breaks = axisC1$center)+
scale_y_continuous(expand = c(0, 0))+
theme_bw() +
theme(
legend.position="none",
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
)
Any ideas would be appreciated, thanks!
Writing
for
loops in RImportant tip for using ggplot in a
for
loopAlso look at the
*apply
family of functions in R, they are sometimes faster and easier to use then writing a loop.Do you have any object containing 'ch1_SCPG'-like pieces, such as dataframe or list? Also, will something else change in iterations in addition to the 'ch1_SCPG'-like piece?
Yes, this is my dataframe ch1_SCPG. The columns are (ignoring the first column): ID CHR POSITION P STUDY_ID and the idea is that I will be able to do this for all chromosomes (1 to 22). I just don't see how can I loop it as it is a chunky piece of code and it confuses me.
I meant actually whether there is one object (list or table) holding the data for different chromosomes. As I understand, ch1_SCPG contains the data for the first chromosome - how did you get this table, by loading some separate file or by filtering some large table with data for all chromosomes?
Also, left_join(ch1_SCPG, ., by=c("CHR"="CHR")) actually performs right join - to make it clear, right_join(ch1_SCPG, "CHR") may be used explicitly.