Truncate unevenly covered BAM file at certain coverage
2
0
Entering edit mode
3.5 years ago
predeus ★ 2.1k

Hello all,

This is basic enough question, but I can't seem to find the answer online.

I have a BAM from RNA-seq alignment which is very unevenly covered. However the mean coverage is huge (~100,000x - it's a mitochondrion). I want to use it for polishing, so I would like to retain about 100x "sliding window" (or simply at any given nucleotide) coverage across the whole chromosome, and lose the rest.

How can I achieve this?

Thank you in advance, as always

UPD: Apparently, there's a discussion here that directly answers my question.

samtools bam • 2.2k views
ADD COMMENT
1
Entering edit mode
3.5 years ago

use the option -s of samtools view

ADD COMMENT
1
Entering edit mode

Nice, my current version of samtools (1.7) doesn't even have that yet! Thank you very much.

ADD REPLY
0
Entering edit mode

All right, I've read about the option and still can't figure out how to use the "filter expressions" for coverage-based filtering. I might be missing something obvious...

ADD REPLY
0
Entering edit mode

my bad. I meant option -s not option -e. (fixed)

  -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
           fraction of templates/read pairs to keep; INT part sets seed)
ADD REPLY
0
Entering edit mode

I see. I knew about -s but this is not what I'm after - I don't want to proportionally downsample the reads, I only want to remove reads in the areas where coverage is > 100, for example.

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Brilliant, I didn't find that thread somehow (it's tricky to select the right keywords). Thank you!

Brian's explanation about normalization via k-mers also makes a lot of sense - it has long puzzled me what Trinity was doing for example.

ADD REPLY
1
Entering edit mode
3.5 years ago
QLFblaireau ▴ 30

You could use samtools depth

for example

samtools depth -a my.bam |awk '$3 >=100' > file.coverage

Then make a bedfile out of the file.coverage and use it in combination with samtools view to only extract position where there is the expected coverage.

Or easier, you could use mosdepth that will directly output a bed file that you can filter.

mosdepth

ADD COMMENT
0
Entering edit mode

Thank you! However, I don't want to find regions with some type of coverage - I want to remove the excessive coverage, retaining reads up to a certain maximum, e.g. 100.

This approach also can work (together with samtools view -s), but it becomes dreadfully convoluted - you need to

  • split bam into windows;
  • downsample each file according to mean coverage in each window;
  • put them back together.

Seems like there should be a tool that's doing it...

ADD REPLY

Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6