Dear All Sorry for bothering. I am doing variant calling using BCFtools mpileup then call and I would like to add a filteration step in the command for mpileup in BCF tool, but really confused about which threshold to be used for a min mapping quality. In my BAM file produced from STAR, most reads have 255 value in the MAPQ field. Now I am confused which number is correct for counting the n of unique mapped reads in STAR alignment BAM file ? I have done this by two ways, which ends up in different results:
- The number/% of unique mapped reads that one can get from the log.files output from STAR aligner
The number that I obtain when running (Most of my BAM file have the 255 value in the MAPQ filed (column 5)
samtools view -q 255 -c filteredfile.bam
So which one of these is the more accurate description of the unique mapped reads ? so that I can use 255 in the filter
so shall I set this to 255 ...
i am using the below command for mpileup in BCF tool
bcftools mpileup -Ou -o Variant/L10A2.bcf -f Ref/GCF_000002315.5_GRCg6a_genomic.fna L10A2_sortedbycordinate_fix_marked.bam -Q 20
Thanks
MAPQ=255 => the mapping quality is meaningless.
see https://samtools.github.io/hts-specs/SAMv1.pdf
Thanks Pierre, This is understandable,. This is meaningless because these are uniquely mapped reads so there is no chance that there MAPQ equal a number. Could you comment on the question: which number to be put in the calling for variant using the BCF tools Could you comment ATpoint
Thanks
This post is very explanatory. As a "raw" filter, in the past I have seen many people using "30". However, you can be more strict if you want, depending on your goal. For variant calling, I personally use a combination of filters including depth, number of reads supporting the alternate allele.. etc.
Thanks Iran, however I do not think 30 is the best because as I mentioned above when running STAR aligner, and output file that only contains mapped reads, around 80% of the reads will have a MAPQ of 255, which as Pierre said is meaningless. So If I put the threshold for 30, it make no sense there will be no selection ? I went into that post you made, and it tells very clear;y that 255 is the MAPQ of the uniquely mapped reads.
By the way, do you know if one can filter the reads in the VCF file for those that are called from reads with certain MAPQ ? I mean filtering the VCF later for the MAPQ rather than doing it during the mpileup step ?
Best
Well, the point is not to discard as many reads as possible, but to keep the ones that are more likely to be correctly mapped. Therefore, the more reads with 255, the better.
Usually in many VCF you can find an output annotation for each variant with the mean mapping quality of the reads in the given position.