BWA MEM is for reads with length 70bp - 1Mbp according to its manual. I have reads vary from 35bp - 301bp from illumina DNA_Seq WGS experiment. In this case do I need to filter reads <70bp in order for BWA MEM to work properly? Or is it advisable to run without filtering any reads?
I have another question regarding mapping. These may be very simple ones but since I'm new to this kind of analysis I need assistance from people with experience.
According to many, mapping tools are now capable of handling poor quality bases. So I decided to map a data set with BWA MEM as it is without QC even though quality degrades towards 3' end. And also to perform some trimming at 3' end for the same data set and then do the mapping.
I'm going to use bedtools coverage,genomecov OR samtools stats to generate mapping statistics for the above 2 cases to see what mapping is better. Do you think this would give me a fair answer?
That's a fair enough strategy. It's generally best practice to trim off adapters and really low quality (<5 or so) bases from the end, but nothing more stringent. The gains from even this probably aren't exactly huge any more, the sequencing quality is pretty good these days.
I agree, though I tend to use a much higher quality cutoff with analyses such as assembly (though this is also highly dependent on the expected coverage, and generally after we do a kmer freq plot and run preQC to look for idiosyncrasies).
Thank you for clearing my doubt.