search for low coverage regions on bam file
2
1
Entering edit mode
7.6 years ago
mary ▴ 10

Trying to find regions with a coverage less than 9 in a BAM file, I used:

bedtools genomecov -bga -ibam infile.bam | awk '$4<9' > lessthan9cov.bed

The output file is something like:

scaffold1    12861    12866    7
scaffold1    12866    12877    1
scaffold1    12877    12896    3
scaffold1    31141    31193    8
scaffold2    31193    31229    5
..

This output has too many rows. Is there a way to have a file which for any continuous region with any coverage less than 9, has a single row?

The desired file should have:

scaffold1    12861    12896
scaffold1    31141    31193
scaffold2    31193    31229

Thank you

bedtools genomecov awk coverage • 5.2k views
ADD COMMENT
0
Entering edit mode

Is your genomecov command correct? Don't you need to specify genome file?

ADD REPLY
0
Entering edit mode

As much as I know, when working with BAM files, you do not need to specify genome file.

ADD REPLY
2
Entering edit mode
7.6 years ago
PoGibas 5.1k

Yes, there is a way.

  1. Use awk to filter regions with wanted coverage
  2. Use bedtools merge to merge those filtered regions

Not tested solution (using OP's code):

bedtools genomecov -bga -ibam infile.bam -g ${fileGenome} | 
    awk '$4 < 9' |
    bedtools merge -i - \
> RESULT.bed
ADD COMMENT
0
Entering edit mode

Thanks, neat and helpful! just the -i option should be removed

ADD REPLY
0
Entering edit mode

Thanks, fixed.

ADD REPLY
0
Entering edit mode

Can this code be modified in a way that low coverage regions of both ends in each contig would not be listed?

ADD REPLY
0
Entering edit mode

What is end of the contig? 1kb?

ADD REPLY
0
Entering edit mode

the original bam file has some mapped scaffolds, each with a defined length. I want to have low coverage regions but not when they are near the beginning or at the end of a scaffold. for example, if scaffold 1 is 2000 bp and has a low coverage region at 4-26 bp or another one at 1900-190 bp these regions would not be outputted to the BED file.

ADD REPLY
1
Entering edit mode
7.6 years ago

using samtools depth:

samtools depth  in.bam | awk 'BEGIN {prevc="";prevp=0;prevd=0;start=0;} {c=$1;p=int($2);d=int($3);if(NR>1 && !(prevc==c && prevp+1==p && prevd==d)) { if(pred<9) printf("%s\t%d\t%d\t%d\n",prevc,start,prevp,prevd);start=p;} prevc=c;prevp=p;prevd=d; } END {if(pred<9) printf("%s\t%d\t%d\t%d\n",prevc,start,prevp,prevd);}'
3   10192625    10192629    12
3   10192630    10192633    11
3   10192634    10192640    12
3   10192641    10192641    13
3   10192642    10192647    12
3   10192648    10192650    10

depends of your needs, you might add the option '-a' (twice) to samtools:

   -a                  output all positions (including zero depth)
   -a -a (or -aa)      output absolutely all positions, including unused ref. sequences

ADD COMMENT

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6