Entering edit mode
7.8 years ago
diltsjeri
▴
470
Hi,
If I'm given a fastq file. How am I able to tell if it's a fastqsanger format or not? Is there a command line tool available for this? I really don't want to have to go into galaxy to ensure its format.
Thanks!
A: Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?
head -n 40 file.fastq | awk '{if(NR%4==0) printf("%s",$0);}' | od -A n -t u1 | awk 'BEGIN{min=100;max=0;}{for(i=1;i<=NF;i++) {if($i>max) max=$i; if($i<min) min=$i;}}END{if(max<=74 && min<59) print "Phred+33"; else if(max>73 && min>=64) print "Phred+64"; else if(min>=59 && min<64 && max>73) print "Solexa+64"; else print "Unknown score encoding\!";}'
Also, for clarification. Is it safe to assume that all the latest nexgen (NexSeq, HiSeq, IonTorrent, etc) technologies are outputting fastqsanger as the .fastq? Or should I just never assume anything?
Assume it but make sure ;)