Larger number of reads after filtering 'chrM'??
2
1
Entering edit mode
8.0 years ago
cpc5 ▴ 50

Hi all,

I want to remove all mitochondrial reads from my BAM files, but after doing so the number of mapped reads increased. Here's what I did after indexing:

samtools idxstats .bam | cut -f 1 | grep -v chrM | xargs samtools view -b  .bam > clean.bam

However comparing with wc -l says:

samtools view -F 0x904 -c clean.bam 
19452291
samtools view -F 0x904 -c .bam
18328474

The second number matches the summary output by tophat2 (fr-frstrand)

any suggestions?

RNA-Seq sequencing alignment • 3.4k views
ADD COMMENT
1
Entering edit mode

I am using the exact same command, except that I put chrM in quotation marks. Also, don't forget the -h flag in the last view, as many tools need the BAM header to work properly (or work at all).

ADD REPLY
0
Entering edit mode

He's outputting to BAM (-b) so it will always have a header. It's only SAM where it's optional for some reason.

ADD REPLY
5
Entering edit mode
8.0 years ago

Your approach looks pretty strange to me, and I would instead do the following:

samtools view -h yourbam.bam | grep -v 'chrM' | samtools view -b > yourbamwithoutchrM.bam
ADD COMMENT
2
Entering edit mode

This will also filter out reads whose mate maps to chrM which I'm not sure the OP wants to do.

ADD REPLY
1
Entering edit mode

Hm, you are probably right. I'm not sure why OP would want to remove reads mapping to chrM in the first place. But I think the event you describe is rare and discordant reads like that aren't really desirable.

That said, I'm not sure if there is a good reason to start tampering with bam files.

ADD REPLY
1
Entering edit mode

All the likes.

Also OP i ran your thing on some single-end data and it worked for me giving the correct result. I'd try WDC's method and if that gives the same result, update your samtools.

ADD REPLY
0
Entering edit mode

Wdc method gave similar output. I also tracked back to another data set where I did the same and it worked fine. There must be something odd with my data. WdC, I was asked by my PI to remove chrM, not much I can do about that. Thanks for your answers everyone

ADD REPLY
1
Entering edit mode

What does just "samtools idxstats .bam | cut -f 1 | grep -v chrM " show? I'm curious. I bet there's something interesting in there like a chromosome mentioned twice.

ADD REPLY
0
Entering edit mode
8.0 years ago
cpc5 ▴ 50

I found the issue! this is with samtools 1.2 with reads mapped to gencode v25

the increased number is due to chr1 doubling in number after sorting by position (default). I still dont know why this happens though.

no chrM:
chr1    248956422   5140060 0
all:
chr1    248956422   2570030 0

maybe it has something to do with the reads named, GL00000x.x...

ADD COMMENT

Login before adding your answer.

Traffic: 1846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6