Hi all,
I have posted my mapping stats on one of my samples for your information. So I am at the post-mapping stage of exome seq analysis. I have watched Broad video on variant calling and galaxy tutorial etc. This is the first time I am doing exome seq analysis. Just feeling stuck and don't know how to proceed!
As you can see I have good number of reads that have both pairs mapped to the reference (98.26%) and no duplicates.
My question is:
1. Do I filter out just the properly paired reads only and take that to proceed into variant calling for indel rearrangement,BQSR and var calling? ( this way I can be very confident in my var calling)
OR
2. Filter out my bams, to include mapped single mates ( when just one mate of the pair is mapped and the other is not) as well? If so then do I include the single mates with another mate in another chromosome? and then proceed to var calling?
Any direction to the next steps of analysis would guide me. Thanks for your time in advance :)
151293750 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
201784 + 0 supplementary
0 + 0 duplicates
149562179 +0 mapped (98.86% : N/A)
151091966 + 0 paired in sequencing
75545983 + 0 read1
75545983 + 0 read2
148458680 + 0 properly paired (98.26% : N/A)
149256604 + 0 with itself and mate mapped
103791 + 0 singletons (0.07%: N/A)
539982 + 0 with mate mapped to a different chr
423655 + 0 with mate mapped to a different chr (mapQ>=5)
It is better to use the properly paired reads, unless you are looking for a rare (low frequency variant)
Thanks JC. I will do a first pass using properly paired reads and then come back and do a second take to screen rare variants.
I would still use mark duplicates, I am not convinced there are absolutely no duplicates. If anyone has thoughts on this please feel free to comment.
Ok, couple of updates on this from what I posted last time:
Just FYI for anyone who is looking for a followup on this post.