Hi,
While analyzing some amplicons we decided to increase the mapping stringency in the post sequencing process to ensure the data was free of debris. As we increased the mapq value to 18 (using samtools) we noticed the majority of a particular amplicon's mapq values were lower than 10.
Amp1 below shows normal data from a different amplicon and Amp2 is the the amplicon we are having questions about.
Amp1 Amp2
mapq count mapq count
<=10 11 <=10 986
11-20 7 11-20 5
21-30 3 21-30 0
31-40 1 31-40 0
41-50 0 41-50 1
51-60 6 51-60 0
61-70 12 61-70 1
71-80 79 71-80 1
81-90 316 81-90 0
91-100 100 91-100 0
We looked into the trim function and found that the default cutoff is 16 and the window size is 30. This means that all the base accuracy is 97.5%.
Don't good phred values also give a good mapping score? This is assuming my reference is OK and it seems to be.....
Any advice would be appreciated. Thanks.
Great point. I could see that happening with our reference. What's a better way to clean up our data if not mapq?