Hi all!
I have some Illumina 100bp paired fastq that I'm planning to map to a reference genome. There is a long story about how I got these reads; the important point is that I just receive them without any prior information about their quality or preprocessing, so I decided to run fastqc... Most of them look quite good and seem ready to be mapped. But some others presents the following patterns: https://ibb.co/fLLYmQ, https://ibb.co/kEhPt5, https://ibb.co/dtCqY5.
According with a post in sourceforge (https://sourceforge.net/p/bio-bwa/mailman/bio-bwa-help/thread/530E1378.3040008@cam.ac.uk/) and what I have heard from several colleagues, Bwa-mem algorithm is quite good at dealing with low quality reads on its own. But I am not sure to which extent this is true. Can I safely go on and map them or would it be better to if I delete the lanes that look problematic. Also, how would another program like stampy fare in this cases?
You can use BBMap's FilterByTile tool to get rid of just the reads in the problematic parts of a flowcell. But even without that, the reads should map fine.
BWA MEM is a very good choice for longish reads that contain mismatches. I did an evaluation last year that could be useful for you (http://biorxiv.org/content/early/2016/05/16/053686)