I have Illumina HiSeq2000 paired end reads from RNA sequencing (CASAVA version 1.8). I have gotten the information that the FastQ file quality encoding is in Sanger format. They have been quality checked and screened.
Now my question is, do they need any kind of grooming before mapping them to a reference genome? I'm thinking of Fastq grooming in Galaxy. Or is it fine to upload them as fastqsanger and assemble them straight away using Tophat?
Obi Griffith: Thank you for your answer. I searched beforehand but couldn't find any similar questions. A follow-up question, I tried grooming a few of my files with the input Illumina 1.3-1.7, and flagstat on the bam-files give me an approximate ~90% properly paired reads. While if I map my reads directly I get ~80% properly paired reads. Can anyone explain why this is? And is it wrong to groom them this way before mapping?