Question

SNP density plot

1

Entering edit mode

8.1 years ago

krp0001 ▴ 40

Hello,

all,

I am working on SNP studies on fungus. I have over 1million snps and i have denovo assembled reference genome of 45Mb. I would like to plot density of SNPs over the genome in sliding window or /best suitable option.

*the vcf file has information on scaffolds and same for reference genome (over 1700).

Could anyone suggest few programmes that can handle larger files?

Thank you krp

SNP R sequencing Assembly • 4.9k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 8.1 years ago by krp0001 ▴ 40

0

Entering edit mode

How large is the vcf file for 1 million SNPs?

ADD REPLY • link 8.1 years ago by Charles Yin ▴ 180

0

Entering edit mode

Hi my vcf file size is 911M

ADD REPLY • link 8.1 years ago by krp0001 ▴ 40

score 1 · Answer 1 · 2017-10-28

1

Entering edit mode

8.1 years ago

Alex Reynolds 36k

BEDOPS scales to arbitrarily-sized inputs. You could use Kent utilities fetchChromSizes to make a BED file of reference genome chromosome sizes (or your own approach, if you're bringing your own genome), use vcf2bed to convert the VCF file to BED, use bedops --chop to generate windows, and use bedmap to count SNPs over sliding windows.

Generating chromosome bounds, e.g. for hg38:

$ fetchChromSizes hg38 | awk '{ print $1, "0", $2 }' > hg38.bed

Or if you have a de novo genome, you would use your technique of choice to generate a similar bounds file.

Making sliding windows, e.g. 10k windows:

$ bedops --chop 10000 hg38.bed > hg38.windows.bed

Counting SNPs over windows that are padded out 5k on both sides:

$ bedmap --echo --count --range 5000 hg38.windows.bed <(vcf2bed < snps.vcf) > hg38.windows.counts.bed

Using --range this way does counting where consecutive windows overlap by 5kb.

Once you see how the above steps are done, you can concatenate them into a one-liner:

$ fetchChromSizes hg38 | awk '{ print $1, "0", $2 }' | bedops --chop 10000 - | bedmap --echo --count --range 5000 - <(vcf2bed < snps.vcf) > hg38.windows.counts.bed

Using Unix streams in this way eliminates the time cost in making intermediate files, so this scales up for whole-genome inputs.

Once you have a file like hg38.windows.counts.bed (for whatever reference genome and window size and sliding parameters you choose) you could remake it into a Bedgraph file and then into bigWig to plot density in a UCSC genome browser instance. Or plot the signal directly with Circos or R or other tools to view density per-chromosome.

ADD COMMENT • link 8.1 years ago by Alex Reynolds 36k

0

Entering edit mode

I am stuck with a similar question, I've got a list of SNPs (3 columns - chr1 / start / stop). I've managed to produce the bedops windows.bed file (for my organism / bacteria - used 1000 for chop). However, I get some odd results using the bedmap count:

$ bedmap --echo --count --range 500 mybacteria.windows.bed < listofsnps.csv > mybacteria.windows.counts.bed

The results look like this:

chr1 0 1000|2
chr1 1000 2000|3
chr1 2000 3000|3

a similar... whereas the first lines in my list of snps.txt look like this:

chr1 15397 15397
chr1 20973 20973
chr 26518 26518

Any help would be appreciated, perhaps just a simple bin command in R would suffice.

ADD REPLY • link 6.8 years ago by kalfsnes ▴ 10

1

Entering edit mode

$ bedmap --echo --count --range 500 mybacteria.windows.bed SNPs.bed > answer.bed

If listofsnps.csv is a comma-separated file, then you will need to turn that into a sorted BED file (e.g., called SNPs.bed) to make it work with the BED-mapping tool bedmap. How you do this would be up to you, since you know its structure and content.

I don't know what the structure of listofsnps.csv looks like, but you do. I'd suggest converting the file to a sorted BED file and giving it another go. Hope this helps!

ADD REPLY • link 6.8 years ago by Alex Reynolds 36k

1

Entering edit mode

Thank you so much! I did a stupid mistake where the csv file actually was tab/BED format - so I just had to rename to BED and remove header and it solved it.

ADD REPLY • link 6.8 years ago by kalfsnes ▴ 10