For my illumina data fastqc shows presence of N's at positions 13,14,15 in 101 bp longs reads. If I go for cropping first 15 bases by using trimmomatic, it solves the problem but I lose a lot of data. I wanted to know that if I retain the N's what sort of problems would they cause during alignment(bwa+stampy)/variant calling(unified genotyper) and how can I handle these problems?
If any body faced a similar problem how did you handle it?
Similar questions asked on different forums but none has answered.
Could not find a resourse on how variant calling programs handle N's. Do they ignore them? Or consider them as a variation with low confidence scores?
Following is the image for per base n content from fastqc http://i43.tinypic.com/sfyz5z.jpg
Shouldn't you first investigate why you got those weird Ns at these positions?
These are possibly due to machine read errors during sequencing. These are particular to only 1 of 3 runs. Looking for a way of handling these without losing a lot of sequence data.