Do you know any tools for plotting SNPs density on chromosomes?
Do you know any tools for plotting SNPs density on chromosomes?
I prefer a ggplot2-solution:
And then in R:
snps<-read.table("snp132_ucsc_hg19.bed",sep="\t",header=F)
colnames(snps)<-c("chr","start","end","id","score","strand")
# put the chromosomes in the good order: chr1, chr2, chr22, chrX
goodChrOrder <- paste("chr",c(1:22,"X","Y"),sep="")
snps$chr <- factor(snps$chr,levels=goodChrOrder)
# Plot the densities of snps in the bed file for each chr seperately
library(ggplot2)
snpDensity<-ggplot(snps) +
geom_histogram(aes(x=start),binwidth=1e6) + # pick a binwidth that is not too small
facet_wrap(~ chr,ncol=2) + # seperate plots for each chr, x-scales can differ from chr to chr
ggtitle("Density of SNP-132 across hg19") +
xlab("Position in the genome") +
ylab("SNP density") +
theme_bw() # I prefer the black and white theme
# save the plot to .png file
png("snp132_density.png",500,750)
print(snpDensity)
dev.off()
Yes, prepare your data in such a way that you have a data frame where each row represents a feature (gene/transcript/variant) containing columns: chromosome, position, group. Group contains information whether it is normal or disease. Then do:
ggplot(yourData) +
geom_histogram(aes(x=position),binwidth=1e6) +
facet_grid(group ~ chr)
Hi Irsan,
I know that this is a old post but, could you think that could be possible to perform this kind of plot but having as input the coverage for each position for each chromosome? Or maybe the files are too big to upload to R? Also I would like to plot a using geom_line instead of histogram, what do you think?
Thanks in advance
Yes, you're right. I'm just trying to figure out how to plot the coverage across all bases for each chromosome using a line plot. And of course, I'm having problems of memory. So I tried to use your approach for SNPs but it does not work due to the "infinite" number of points in the data.
Hello,
am wondering whether this way is useful also for my data. Basically I have a bed file with Genomic Regions. I gave a score to all of these regions and its value range from -10 to 10. So the bed file looks like this
Chr Start End Score
chr1 1 400 -3
chr1 5000 5400 +10
etc..
I would like to plot the score position for position. Can I use this script?
Thanks for any reply.
hello isran, i am vinay kumar reddy, i have vcf file of samples with different snp positions on different chromosomes, so now i want to plot this positions along chromosomes, i am trying to do it with synbreed r package and i could not succeed, can you please help me out. my vcf files consists several columns but first column is of different chromosomes and second column is of different positions on all chromosomes.
thanks, vinay kumar reddy
Hi,
I don't know if such a program exists but I created a function from a small R script I wrote to address such problems of visualization. I think this can cover your need. Here is the function with some examples to test it.
ChrDensity <- function(ChrLength, Positions, Values, ChromosomeBgColor="gray90", ChromosomeBorderColor="black", ValuesColors="black", PlottingGenes=F, GeneColors="black", ylab="Density", xlab="Chromosomal position (Mb)", ...) {
PositionsMb <- Positions / 1000000
ChrLengthMb <- ChrLength / 1000000
plot(NA,NA, xlab=xlab, ylab=ylab , bty="n", xlim=c(0,ChrLengthMb), ylim=c(-1.1,max(Values)), ...)
rect(0, -0.1, ChrLengthMb,-1, col=ChromosomeBgColor, border=ChromosomeBorderColor)
nbG <- length(Positions)
segments(PositionsMb, rep(0,nbG), PositionsMb, Values, col=ValuesColors)
if(PlottingGenes) {
segments(PositionsMb, rep(-0.1,nbG), PositionsMb, rep(-1,nbG), col=GeneColors)
}
}
### Example
ChrLength <- 100000000 # Chromosome of 100 Mb
Positions <- sample(1:100000000, 1000, replace=F)
Values <- sample(1:100 / 10, 1000, replace=T)
# Simple Call
ChrDensity(ChrLength, Positions, Values, main="Test plot")
# Plotting genes and adding specific colors for subset of genes
Colors <- rep("black", length(Positions))
Colors[order(Positions)[200:400]] <- "red"
ChrDensity(ChrLength, Positions, Values, PlottingGenes=T, ValuesColors=Colors, GeneColors=Colors, main="Test plot", ylab="Nb of SNPs")
And here is an example of plots it generates.
Basic plot
Enhanced plot, plotting gene positions and coloring a subset of those.
This can be of course improved. Let me know if you have some suggestions or encounter some bugs.
ascii art and mysql :-)
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19
mysql> select chrom,chromStart,count(*),repeat('#',(count(*)/50000.0)*10) from snp137
where chrom in("chr1","chr2","chr3","chr4","chr5","chr6") group by chrom,ROUND(chromStart/10E6) ;
+-------+------------+----------+------------------------------------------------------+
| chrom | chromStart | count(*) | repeat('#',(count(*)/50000.0)*10) |
+-------+------------+----------+------------------------------------------------------+
| chr1 | 10144 | 115192 | ####################### |
| chr1 | 5000009 | 195026 | ####################################### |
| chr1 | 15000014 | 203190 | ######################################### |
| chr1 | 25000076 | 172068 | ################################## |
| chr1 | 35000040 | 177274 | ################################### |
| chr1 | 45000036 | 167085 | ################################# |
| chr1 | 55000027 | 182151 | #################################### |
| chr1 | 65000016 | 183073 | ##################################### |
| chr1 | 75000010 | 182679 | ##################################### |
| chr1 | 85000178 | 179926 | #################################### |
| chr1 | 95000113 | 176064 | ################################### |
| chr1 | 105000003 | 182362 | #################################### |
| chr1 | 115000059 | 132044 | ########################## |
| chr1 | 142535438 | 115694 | ####################### |
| chr1 | 145000015 | 205771 | ######################################### |
| chr1 | 155000030 | 188877 | ###################################### |
| chr1 | 165000030 | 181378 | #################################### |
| chr1 | 175000030 | 177862 | #################################### |
| chr1 | 185000003 | 183398 | ##################################### |
| chr1 | 195000018 | 182054 | #################################### |
| chr1 | 205000033 | 209352 | ########################################## |
| chr1 | 215000047 | 179814 | #################################### |
| chr1 | 225000055 | 187002 | ##################################### |
| chr1 | 235000110 | 195962 | ####################################### |
| chr1 | 245000033 | 93682 | ################### |
| chr2 | 10132 | 105475 | ##################### |
| chr2 | 5000051 | 190281 | ###################################### |
| chr2 | 15000021 | 177880 | #################################### |
| chr2 | 25000059 | 188206 | ###################################### |
| chr2 | 35000011 | 212628 | ########################################### |
| chr2 | 45000016 | 206472 | ######################################### |
| chr2 | 55000150 | 181265 | #################################### |
| chr2 | 65000062 | 180104 | #################################### |
| chr2 | 75000066 | 198265 | ######################################## |
| chr2 | 85000017 | 168042 | ################################## |
| chr2 | 95326292 | 167528 | ################################## |
| chr2 | 105000006 | 185158 | ##################################### |
| chr2 | 115000210 | 185350 | ##################################### |
| chr2 | 125000054 | 235282 | ############################################### |
| chr2 | 135000003 | 182175 | #################################### |
| chr2 | 145000126 | 172680 | ################################### |
| chr2 | 155000007 | 167622 | ################################## |
| chr2 | 165000065 | 192062 | ###################################### |
| chr2 | 175000114 | 175360 | ################################### |
| chr2 | 185000008 | 167461 | ################################# |
| chr2 | 195000029 | 162383 | ################################ |
| chr2 | 205000055 | 179149 | #################################### |
| chr2 | 215000027 | 184243 | ##################################### |
| chr2 | 225000030 | 188300 | ###################################### |
| chr2 | 235000034 | 167767 | ################################## |
| chr3 | 60112 | 120153 | ######################## |
| chr3 | 5000009 | 204211 | ######################################### |
(...)
| chr6 | 55000001 | 149141 | ############################## |
| chr6 | 65000008 | 194800 | ####################################### |
| chr6 | 75000078 | 198436 | ######################################## |
| chr6 | 85000026 | 190050 | ###################################### |
| chr6 | 95000043 | 174975 | ################################### |
| chr6 | 105000058 | 175326 | ################################### |
| chr6 | 115000032 | 182749 | ##################################### |
| chr6 | 125000145 | 177503 | #################################### |
| chr6 | 135000000 | 175307 | ################################### |
| chr6 | 145000003 | 189212 | ###################################### |
| chr6 | 155000006 | 195229 | ####################################### |
| chr6 | 165000019 | 132148 | ########################## |
+-------+------------+----------+------------------------------------------------------+
128 rows in set (2 min 17.24 sec)
Hello there,
I am trying to plot SNP density using circos which can do the same thing. However, both circos and your plot are asking for an input called value i.e
ChrLength, Positions, Values and in circos (Chromosome) hsX (genomic position) 70700000 70799999 (value) 0.00156?
I am not sure what is this value and how to obtain it, any suggestion ?
Thanks
Hi, I just wonder whether any updates about SNP density plots have been published.
I have few chromosome VCF files i.e. each chromosome is in a separately VCF file. I like @Irsan solution, but just wonder whether it would be possible:
Thank you in advance
Mic
Hej, does anyone know an R package that can plot SNP densities like in the attached paper (figure 2, but there used for FLcDNA density), i.e. as a kind of density heat-map with variable window sizes.
http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000740
Cheers Uli
Hi Uli, you can use the autoplot VCF option implemented in the ggbio [R] package. Check this out : section 6.2.15 : http://www.bioconductor.org/packages/2.11/bioc/vignettes/ggbio/inst/doc/ggbio.pdf Pretty easy to handle Best, C.
I used betools to intersect the SNP list and a file delimits the sliding windows, then plot the SNP density with simple line graph in R. It's not as sophisticated as the R code provided above but it works pretty well except you have to prepare the window interval files.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
HI, I just have a bed file ,not snp . I wanna view genes and bed region genome distribuition . my bed file is like this : chr1 222 333 Gene1 chr2 223 344 Gene2 .....