I have a SAM format file. Would anyone tell me how I should get all coordinates with coverage higher than 5 reads within a specific region (e.g. from chr1 1 to chr 1000)?
I have a SAM format file. Would anyone tell me how I should get all coordinates with coverage higher than 5 reads within a specific region (e.g. from chr1 1 to chr 1000)?
If you have SAM, first convert to BAM:
samtools view -Sb data.sam > data.bam
samtools sort data.bam data.sorted
Then use the bedtools genomecov tool to create BEDGRAPH output:
bedtools genomecov -ibam data.sorted.bam -bg > data.bedgraph
If you want to isolate solely those regions with coverage > 5:
awk '$4 > 5' data.bedgraph > data.gt5.bedgraph
I believe the command "samtools depth" will return a count at every location. It's too much output data, so youll need some kind of streaming parse and filter. I have one in perl somewhere that keeps track of the regions and outputs locations only when thresholds are crossed, collapsing the bed. Others have mentioned bedtools, which might do that for you too.
1) First create a small bam file containing reads of your region of interest.
samtools view -bh input.sorted.bam chr1:100,000-200,000 (region of interest) > regionofinterest.bam
2) Then use older version of samtool (that has a deprecated pileup function) to create a pileup file.
samtools pileup -vcf ref.fa regionofinterest.bam > regionofinterest.pileup
3) You can then write a very simple script that will go through the pileup format and count the number of reads for a coordinate and output one with greater than 5 reads.
Read more about pileup format here: http://en.wikipedia.org/wiki/Pileup_format
There are one-liner solution available with bedtools but I want you to understand basic workflows in NGS analysis.
Here is a similar post. Check for bedtools usage in this post tools to calculate average coverage for a bam file? if you dont want to follow my advice.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can pass
samtools depth
a region and do a quick filtering withawk
:Thanks so much!