Filtering Out Reads Based On Base Quality With Samtools Seems Not To Work Properly
1
3
Entering edit mode
12.8 years ago
Pascal ★ 1.5k

We have an issue with the -Q parameter in samtools mpileup: it seems not working properly.

In a bam file I know that the all reads have base Q = 40, the following commands doesn't filter out nothing:

samtools mpileup -Q 93 out_sorted.bam

Here's a complete workflow to reproduce quickly the issue we are experiencing based on simulated data. It generates a bam file and converts all read base qualities to 40 then samtools mpileup is called with a minimum of 93:

samtools faidx human_g1k_v37.fasta 20:1-5000000 > ref.fasta
wgsim -N 200000 -r 0 -R 0 -X 0 ref.fasta out.read1.fq out.read2.fq > wgsim.out
awk ' /222/ { gsub("2", "I"); print $0; next } { print } ' out.read1.fq > out.read1b.fq  
awk ' /222/ { gsub("2", "I"); print $0; next } { print } ' out.read2.fq > out.read2b.fq
bwa index ref.fasta 
bwa aln ref.fasta out.read1b.fq -f out1.sai
bwa aln ref.fasta out.read2b.fq -f out2.sai
bwa sampe -f out.sam -r '@RG\tID:foo\tSM:bar' ref.fasta out1.sai out2.sai out.read1b.fq out.read2b.fq
samtools view -hbS out.sam -o out.bam
samtools sort out.bam out_sorted
samtools index out_sorted.bam
samtools mpileup -Q 93 out_sorted.bam > mpileup.txt

Please let me know if I did not understand something or if there is a real issue here with the -Q function.

samtools • 7.9k views
ADD COMMENT
0
Entering edit mode

perhaps 93 is an invalid value? just guessing try 92 or 60 or even 41

ADD REPLY
2
Entering edit mode
12.5 years ago
FGV ▴ 170

I'm under the impression that the -Q and -q option won't filter out anything from the pileup file. That is, even reads/bases are below the threshold they will still show up there. SAMTOOLS just won't take them into account when calculating genotype likelihoods (-g).

Also, -Q option in the manual: "-Q INT Minimum base quality for a base to be considered [13] "

which leads me to think that SAMTOOLS won't throw away entire read, but just ignore bases with lower than -Q quality when computing genotype likelihoods (GL)...

ADD COMMENT

Login before adding your answer.

Traffic: 1838 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6