Question

Is there a way to do read filtering (MAPQ> certain value) on a BAM instead of SAM file?

0

Entering edit mode

6.0 years ago

msimmer92 ▴ 310

I'm working on Bash scripts for a ChIPseq pipeline for my lab. Even though the ENCODE guideline suggests to remove duplicates, some people here want to not remove duplicates but filter the reads with certain MAPQ values. For this purpose, I am working on a script that does that.

On the Internet and some other people's scripts, all the examples I have seen so far are filtering SAM files (for example, with awk or grep, knowing that the MAPQ value is on the 5th column of the SAM file, it's not a challenge to extract this; let's say it becomes a simple file-management and text-editing problem). Nevertheless, in the pipeline I'm working on, the inputs come in the form of sorted BAMS (because there's another script in the lab that does the mapping, sorting and conversion to BAM).

So I was wondering, is there a way of doing this filtering of MAPQ=certain values from the sorted BAMs I got from the people, without having to ask them for the SAM files? Thank you!

BAM SAM MAPQ quality filtering • 11k views

ADD COMMENT • link 5.9 years ago by msimmer92 ▴ 310

0

Entering edit mode

Thank you, now it's all clear :) . One more question: if instead of wanting to filtering a BAM for mapping quality I wanted to filter out reads with q<20 in a fastq file (so fastq filtering, not bam), which would be the most recommendable way to do it? Thank you again

ADD REPLY • link 5.9 years ago by msimmer92 ▴ 310

0

Entering edit mode

Fastq files contain quality score per base, maybe you want reads with average quality < 20?

ADD REPLY • link 5.9 years ago by h.mon 35k

0

Entering edit mode

Considering both possibilities - per base filtering and per average read score - what is your recommendation? The concept is that I would like to have a fastq file of higher quality, but I'm not sure which approach is best to take.

ADD REPLY • link 5.9 years ago by msimmer92 ▴ 310

score 4 · Accepted Answer · 2018-11-22

4

Entering edit mode

6.0 years ago

h.mon 35k

-q INT only include reads with mapping quality >= INT [0]

    samtools view -h -q 20 file.bam

ADD COMMENT • link 6.0 years ago by h.mon 35k

0

Entering edit mode

This is great, thank you ! And if I wanted to include two conditions? (such as include reads that MAPQ==1 and MAPQ>20 )?

ADD REPLY • link 6.0 years ago by msimmer92 ▴ 310

1

Entering edit mode

Why would you want to keep MAPQ=1 ? This is pretty bad mapping quality. What are your trying to do ?

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.9k

0

Entering edit mode

A labmate has a script that does that in order to also keep uniquely-mapped reads (which if you have a SAM file made by Bowtie2, it's MAPQ==1), because it seems it's good to have them in ChIPseq data analysis. I'm looking at his code to see if his has certain functions (like that, for example) that mine doesn't and might me good to implement (so, I am open to all your opinions if you think that is not correct/necessary). We always use Bowtie2 for ChIPseq mapping, that's why we use those numbers in particular (uniquely mapped reads ==1 and reads with quality >4). I clarify because depending on the mapper, these numbers might change.

ADD REPLY • link 6.0 years ago by msimmer92 ▴ 310

2

Entering edit mode

MAPQ is a mapping quality, not the number of multi mapped location for a read.

If you want to keep uniquely-mapped reads + MAPQ > 20

samtools view -h -F 256 -q 20 file.bam

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Thank you, I will show this to the people in my lab. I agree with you. There is a mistake in his code, then. I read also the Bowtie2 manual in detail and also states the higher the MAPQ the higher the "uniqueness" of the read, so it doesn't make sense to keep the MAPQ==1 reads.

ADD REPLY • link 6.0 years ago by msimmer92 ▴ 310

1

Entering edit mode

Actually, I never filter mapped reads on mapping quality. If a read is badly mapped on the reference, Bowtie2 will not align it. I say that, the mapped reads fit good enought to be mapped so I keep them all.

Also, take a look at this

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.9k

0

Entering edit mode

To be clear, you use "-F 256", that is the flag of the secondary-alingment reads, to filter those ones in particular, right? (I'm making sure I got that right). The logic would be "from the secondary alignment reads, filter out those ones with q smaller than 20"

ADD REPLY • link 6.0 years ago by msimmer92 ▴ 310

1

Entering edit mode

No, with this command you filter out not primary alignment and you also filter out primary alignment with less than 20 MAPQ. In other words you keep only primary alignment with MAPQ >= 20

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Thank you, it's clear now. One clarification: is this depending on the sorting of the bam? (If I do this filtering without having the bam sorted, would I have a problem? ) Thank you

ADD REPLY • link 5.9 years ago by msimmer92 ▴ 310

1

Entering edit mode

I don't think sorted bam is necessary to filter out but it's always a good point to sort it

ADD REPLY • link 5.9 years ago by Bastien Hervé 5.9k