Plotting Density Of Snps On Chromosomes
7
10
Entering edit mode
12.7 years ago
Leszek 4.2k

Do you know any tools for plotting SNPs density on chromosomes?

snp snp plot • 41k views
ADD COMMENT
0
Entering edit mode

HI, I just have a bed file ,not snp . I wanna view genes and bed region genome distribuition . my bed file is like this : chr1 222 333 Gene1 chr2 223 344 Gene2 .....

ADD REPLY
23
Entering edit mode
11.8 years ago
Irsan ★ 7.8k

I prefer a ggplot2-solution:

Download snp-table from ucsc

And then in R:

snps<-read.table("snp132_ucsc_hg19.bed",sep="\t",header=F)
colnames(snps)<-c("chr","start","end","id","score","strand")

# put the chromosomes in the good order: chr1, chr2, chr22, chrX
goodChrOrder <- paste("chr",c(1:22,"X","Y"),sep="")
snps$chr <- factor(snps$chr,levels=goodChrOrder)

# Plot the densities of snps in the bed file for each chr seperately
library(ggplot2)
snpDensity<-ggplot(snps) + 
geom_histogram(aes(x=start),binwidth=1e6) + # pick a binwidth that is not too small 
facet_wrap(~ chr,ncol=2) + # seperate plots for each chr, x-scales can differ from chr to chr
ggtitle("Density of SNP-132 across hg19") +
xlab("Position in the genome") + 
ylab("SNP density") + 
theme_bw() # I prefer the black and white theme

# save the plot to .png file
png("snp132_density.png",500,750)
print(snpDensity)
dev.off()

enter image description here

ADD COMMENT
0
Entering edit mode

Hi Irsan If I have 2 conditions normal and diseased and i want to plot the same way the densities chr(1-Y) normal on one side and chr(1-Y) diseased on other side will it be possible? (in your plot its chr1 on one side and chr10 on other side)

ADD REPLY
2
Entering edit mode

Yes, prepare your data in such a way that you have a data frame where each row represents a feature (gene/transcript/variant) containing columns: chromosome, position, group. Group contains information whether it is normal or disease. Then do:

ggplot(yourData) + 
geom_histogram(aes(x=position),binwidth=1e6) +
facet_grid(group ~ chr)
ADD REPLY
0
Entering edit mode

Wow! thank you. ill do that

ADD REPLY
2
Entering edit mode

and if you want your chromosomes to be sorted in the right way make sure your chromosome column is a factor with levels in the right order:

data$chromosome <- factor(data$chromosome, levels = paste("chr",c(1:22,"X","Y"),sep=""))
ADD REPLY
0
Entering edit mode

Hi Irsan,
I know that this is a old post but, could you think that could be possible to perform this kind of plot but having as input the coverage for each position for each chromosome? Or maybe the files are too big to upload to R? Also I would like to plot a using geom_line instead of histogram, what do you think?
Thanks in advance

ADD REPLY
0
Entering edit mode
it doesnt make sense to make a histogram for each base pair in the genome because you cant see it. Before plotting and importing data in R/python or anything, first aggregate your coverage into bins of for example 100 kbps
ADD REPLY
0
Entering edit mode

Yes, you're right. I'm just trying to figure out how to plot the coverage across all bases for each chromosome using a line plot. And of course, I'm having problems of memory. So I tried to use your approach for SNPs but it does not work due to the "infinite" number of points in the data.

ADD REPLY
0
Entering edit mode

Hello,

am wondering whether this way is useful also for my data. Basically I have a bed file with Genomic Regions. I gave a score to all of these regions and its value range from -10 to 10. So the bed file looks like this

Chr Start End Score
chr1 1 400 -3
chr1 5000 5400 +10
etc..

I would like to plot the score position for position. Can I use this script?

Thanks for any reply.

ADD REPLY
0
Entering edit mode

I know this is old and something could change in libraries but is there anybody to help with that error message when trying to recreate this: Error: StatBin requires a continuous x variable the x variable is discrete. Perhaps you want stat="count"? Thanks in advance

ADD REPLY
0
Entering edit mode

hello isran, i am vinay kumar reddy, i have vcf file of samples with different snp positions on different chromosomes, so now i want to plot this positions along chromosomes, i am trying to do it with synbreed r package and i could not succeed, can you please help me out. my vcf files consists several columns but first column is of different chromosomes and second column is of different positions on all chromosomes.

thanks, vinay kumar reddy

ADD REPLY
19
Entering edit mode
12.7 years ago
Philippe ★ 1.9k

Hi,

I don't know if such a program exists but I created a function from a small R script I wrote to address such problems of visualization. I think this can cover your need. Here is the function with some examples to test it.

ChrDensity <- function(ChrLength, Positions, Values, ChromosomeBgColor="gray90", ChromosomeBorderColor="black", ValuesColors="black", PlottingGenes=F, GeneColors="black", ylab="Density", xlab="Chromosomal position (Mb)", ...) {

    PositionsMb <- Positions / 1000000
    ChrLengthMb <- ChrLength / 1000000

    plot(NA,NA, xlab=xlab, ylab=ylab , bty="n", xlim=c(0,ChrLengthMb), ylim=c(-1.1,max(Values)), ...)
    rect(0, -0.1, ChrLengthMb,-1, col=ChromosomeBgColor, border=ChromosomeBorderColor)
    nbG <- length(Positions)
    segments(PositionsMb, rep(0,nbG), PositionsMb, Values, col=ValuesColors)
    if(PlottingGenes) {
        segments(PositionsMb, rep(-0.1,nbG), PositionsMb, rep(-1,nbG), col=GeneColors)
    }

}


### Example
ChrLength <- 100000000 # Chromosome of 100 Mb
Positions <- sample(1:100000000, 1000, replace=F)
Values <- sample(1:100 / 10, 1000, replace=T)

# Simple Call
ChrDensity(ChrLength, Positions, Values, main="Test plot")


# Plotting genes and adding specific colors for subset of genes

Colors <- rep("black", length(Positions))
Colors[order(Positions)[200:400]] <- "red"
ChrDensity(ChrLength, Positions, Values, PlottingGenes=T, ValuesColors=Colors, GeneColors=Colors, main="Test plot", ylab="Nb of SNPs")

And here is an example of plots it generates.

Basic plot alt text

Enhanced plot, plotting gene positions and coloring a subset of those. alt text

This can be of course improved. Let me know if you have some suggestions or encounter some bugs.

ADD COMMENT
0
Entering edit mode

+1 that's useful. I'm python freak, so I will try to code similar stuff using pyplot. thanks for the idea!

ADD REPLY
0
Entering edit mode

The figures are gone!

ADD REPLY
10
Entering edit mode
11.8 years ago

ascii art and mysql :-)

   $ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19
    mysql> select chrom,chromStart,count(*),repeat('#',(count(*)/50000.0)*10) from snp137
        where chrom in("chr1","chr2","chr3","chr4","chr5","chr6")  group by chrom,ROUND(chromStart/10E6) ;
    +-------+------------+----------+------------------------------------------------------+
    | chrom | chromStart | count(*) | repeat('#',(count(*)/50000.0)*10)                    |
    +-------+------------+----------+------------------------------------------------------+
    | chr1  |      10144 |   115192 | #######################                              |
    | chr1  |    5000009 |   195026 | #######################################              |
    | chr1  |   15000014 |   203190 | #########################################            |
    | chr1  |   25000076 |   172068 | ##################################                   |
    | chr1  |   35000040 |   177274 | ###################################                  |
    | chr1  |   45000036 |   167085 | #################################                    |
    | chr1  |   55000027 |   182151 | ####################################                 |
    | chr1  |   65000016 |   183073 | #####################################                |
    | chr1  |   75000010 |   182679 | #####################################                |
    | chr1  |   85000178 |   179926 | ####################################                 |
    | chr1  |   95000113 |   176064 | ###################################                  |
    | chr1  |  105000003 |   182362 | ####################################                 |
    | chr1  |  115000059 |   132044 | ##########################                           |
    | chr1  |  142535438 |   115694 | #######################                              |
    | chr1  |  145000015 |   205771 | #########################################            |
    | chr1  |  155000030 |   188877 | ######################################               |
    | chr1  |  165000030 |   181378 | ####################################                 |
    | chr1  |  175000030 |   177862 | ####################################                 |
    | chr1  |  185000003 |   183398 | #####################################                |
    | chr1  |  195000018 |   182054 | ####################################                 |
    | chr1  |  205000033 |   209352 | ##########################################           |
    | chr1  |  215000047 |   179814 | ####################################                 |
    | chr1  |  225000055 |   187002 | #####################################                |
    | chr1  |  235000110 |   195962 | #######################################              |
    | chr1  |  245000033 |    93682 | ###################                                  |
    | chr2  |      10132 |   105475 | #####################                                |
    | chr2  |    5000051 |   190281 | ######################################               |
    | chr2  |   15000021 |   177880 | ####################################                 |
    | chr2  |   25000059 |   188206 | ######################################               |
    | chr2  |   35000011 |   212628 | ###########################################          |
    | chr2  |   45000016 |   206472 | #########################################            |
    | chr2  |   55000150 |   181265 | ####################################                 |
    | chr2  |   65000062 |   180104 | ####################################                 |
    | chr2  |   75000066 |   198265 | ########################################             |
    | chr2  |   85000017 |   168042 | ##################################                   |
    | chr2  |   95326292 |   167528 | ##################################                   |
    | chr2  |  105000006 |   185158 | #####################################                |
    | chr2  |  115000210 |   185350 | #####################################                |
    | chr2  |  125000054 |   235282 | ###############################################      |
    | chr2  |  135000003 |   182175 | ####################################                 |
    | chr2  |  145000126 |   172680 | ###################################                  |
    | chr2  |  155000007 |   167622 | ##################################                   |
    | chr2  |  165000065 |   192062 | ######################################               |
    | chr2  |  175000114 |   175360 | ###################################                  |
    | chr2  |  185000008 |   167461 | #################################                    |
    | chr2  |  195000029 |   162383 | ################################                     |
    | chr2  |  205000055 |   179149 | ####################################                 |
    | chr2  |  215000027 |   184243 | #####################################                |
    | chr2  |  225000030 |   188300 | ######################################               |
    | chr2  |  235000034 |   167767 | ##################################                   |
    | chr3  |      60112 |   120153 | ########################                             |
    | chr3  |    5000009 |   204211 | #########################################            |
   (...)
    | chr6  |   55000001 |   149141 | ##############################                       |
    | chr6  |   65000008 |   194800 | #######################################              |
    | chr6  |   75000078 |   198436 | ########################################             |
    | chr6  |   85000026 |   190050 | ######################################               |
    | chr6  |   95000043 |   174975 | ###################################                  |
    | chr6  |  105000058 |   175326 | ###################################                  |
    | chr6  |  115000032 |   182749 | #####################################                |
    | chr6  |  125000145 |   177503 | ####################################                 |
    | chr6  |  135000000 |   175307 | ###################################                  |
    | chr6  |  145000003 |   189212 | ######################################               |
    | chr6  |  155000006 |   195229 | #######################################              |
    | chr6  |  165000019 |   132148 | ##########################                           |
    +-------+------------+----------+------------------------------------------------------+
    128 rows in set (2 min 17.24 sec)
ADD COMMENT
1
Entering edit mode
12.7 years ago
Houkto ▴ 220

Hello there,

I am trying to plot SNP density using circos which can do the same thing. However, both circos and your plot are asking for an input called value i.e

ChrLength, Positions, Values and in circos (Chromosome) hsX (genomic position) 70700000 70799999 (value) 0.00156?

I am not sure what is this value and how to obtain it, any suggestion ?

Thanks

ADD COMMENT
1
Entering edit mode
9.2 years ago
Ric ▴ 440

Hi, I just wonder whether any updates about SNP density plots have been published.

I have few chromosome VCF files i.e. each chromosome is in a separately VCF file. I like @Irsan solution, but just wonder whether it would be possible:

  • to put real numbers on the x-axis
  • to stretch the y-axis so more details would be visible
  • overall to increase the width of the plots so that 2 plots fill an A4 (portrait) page
  • to replace BED file by VDF file?

Thank you in advance

Mic

ADD COMMENT
0
Entering edit mode
11.8 years ago
Ulrich • 0

Hej, does anyone know an R package that can plot SNP densities like in the attached paper (figure 2, but there used for FLcDNA density), i.e. as a kind of density heat-map with variable window sizes.

http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000740

Cheers Uli

ADD COMMENT
0
Entering edit mode

Hi Uli, you can use the autoplot VCF option implemented in the ggbio [R] package. Check this out : section 6.2.15 : http://www.bioconductor.org/packages/2.11/bioc/vignettes/ggbio/inst/doc/ggbio.pdf Pretty easy to handle Best, C.

ADD REPLY
0
Entering edit mode
11.8 years ago
Vitis ★ 2.6k

I used betools to intersect the SNP list and a file delimits the sliding windows, then plot the SNP density with simple line graph in R. It's not as sophisticated as the R code provided above but it works pretty well except you have to prepare the window interval files.

ADD COMMENT

Login before adding your answer.

Traffic: 2001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6