Hello,
I have a dataset of short reads in which some fastq files are in the Illumina 1.5 format, and others in the Illumina 1.8. My plan is to align these reads using bwa mem, and later do SNP calling on these.
The main difference between these two formats is that the phred scores are encoded in a different way (e.g. see http://en.wikipedia.org/wiki/FASTQ_format ). Thus, when I used bwa aln on the Illumina 1.5 format, I had to use the -I option to specify that the phred scores were encoded differently. I used to run something like:
bwa aln -I reference seq_illumina15.fastq.gz
bwa aln reference seq_illumina18.fastq.gz
However, in bwa mem, there is no documentation about a -I option, or about how to specify which version of the fastq format is used (http://bio-bwa.sourceforge.net/bwa.shtml ). Thus, what is the correct way to specify how the phred scores are encoded, in bwa mem?
thank you very much for the answer. So, I will have to convert the Illumina 1.5 files to 1.8 (or, explained in different words, phred+33 to phred+64), before running bwa mem.
I guess you mean phred+64 to phred+33? Because phred+33 is the new one (Illumina 1.8 and Sanger) - just to prevent confusion.