I want to subset 80M paired-end reads from fastq.gz files that contain 120M paired-end reads. I used the reformat.sh package, and here is my code:
$REFORMAT/reformat.sh in1=B4337_4_R1.fastq.gz in2=B4337_4_R2.fastq.gzout1=$OUT/B4337_4_80M_R1.fastq.gz out2=$OUT/B4337_4_80M_R2.fastq.gz samplereadstarget=80000000
After running the script, I found that 80M reads were sampled, but all the quality scores have been changed:
In the old fastq.gz file, I had one read:
@K00208:YAP076:7:1101:18.42:12.06#0/1
NGTTAAGAGCATGAATCTTACTACTGAATGATCTTAAACAAGTTACAGCAGGTCCTCAAACGACATCAGTTTCATTCAACACTGTTTTCTTATGATTTTGA
+
@```[ieeeiiiiieieeiiiiiiiieiieeeiiiiiiiiiieiiiiiiie``eeiiiiieVeiee`iieeeeiieiii`ieee``iieieii`````iii
Whereas in the new fastq.gz file, I had the following:
@K00208:YAP076:7:1101:18.42:12.06#0/1
NGTTAAGAGCATGAATCTTACTACTGAATGATCTTAAACAAGTTACAGCAGGTCCTCAAACGACATCAGTTTCATTCAACACTGTTTTCTTATGATTTTGA
+
!AAA<JFFFJJJJJFJFFJJJJJJJJFJJFFFJJJJJJJJJJFJJJJJJJFAAFFJJJJJF7FJFFAJJFFFFJJFJJJAJFFFAAJJFJFJJAAAAAJJJ
This is very weird to me, and it interferes with my next step, which is to use NGmerge.sh to trim the adaptors. Does anyone know what may be wrong with my code? How may I fix the issue? Thanks, Jenny
While the in-line help does say what you quoted above, that text does NOT mean that that is what
reformat.sh
does by default. It just means that one is able to change the encoding if one wants to.