@lh3: Thanks for the info. I am wondering whether I should do the data-cleanup before / after converting the illumina to sanger score. Please let me know your thoughts.
I was wondering if you run a bunch of mixed formatted (some Illumina and some Sanger) using maq or seqret to create them all in a fixed ASCII-33 format, will these two tools skip the files that are already in Sanger (ASCII-33) format and convert only ASCII-64 files to ASCII-33?
ADD REPLY
• link
updated 2.4 years ago by
Ram
44k
•
written 9.6 years ago by
bioinfo
▴
840
Is there documentation for which Fastq dialects seqret support and how to reference those dialects with a seqret command? I'm not seeing it in the EMBOSS documentation.
Thanks Brent, I noticed that you mentioned about a potential issue with paired end read that not taken up by fastx_toolkit ( Filtering Paired End Reads ) Does this tool take care of that?
If you're just converting, not filtering that won't be an issue. If you filter after the conversion (for whatever reason), then yes, you'll probably have to figure out how to make sure you get neither or both reads.
Thanks. I am posted another question on filtering I am not sure if it is good to do the filtering of the reads before / after the QC makes much difference.
I'm not so familiar with C. If I want to use this script to convert fastq illumina quality score to fastq sanger quality score, what command should I run?
A dirty and quick solution would be to make a FASTA file with Ns or any other rarely occurring homopolymer sequence of length equal to your read length. Align you FATSQ file against this reference with any quality aware aligner like BWA or Bowtie to get the BAM file (you can parallelize it for speed). Now by definition in BAM file quality scores are recored as Sanger scores. All the reads will be reported only once as unaligned reads. Now you can use Picard to get the Fastq back from the BAM file with Sanger scores!
Note: there is a difference in the way quality scores are recored in Ilummina Fastq files pre and post 1.3 version of Casava.
For anyone dealing with the problem of various fastq encoding schemes and looking to do some sanity checks on their method of conversion. Or, if you just want to look up the phred score for specific ascii code under one of the different schemes. I have found this blog entry extremely useful. It provides Sanger (And Illumina 1.3+ (And Solexa)) Phred Score (Q) ASCII Glyph Base Error Conversion Tables.
fq_all2std.pl is outdated...
What does it mean? The script is not suitable?
I think converting to the sanger scale should be the first step.
@lh3: Thanks for the info. I am wondering whether I should do the data-cleanup before / after converting the illumina to sanger score. Please let me know your thoughts.
Thanks @lh3 !!
I was wondering if you run a bunch of mixed formatted (some Illumina and some Sanger) using maq or seqret to create them all in a fixed ASCII-33 format, will these two tools skip the files that are already in Sanger (ASCII-33) format and convert only ASCII-64 files to ASCII-33?