Hello guys!
I interleaved R1.fastq and R2.fastq files into one file called interleaved.fastq, so that for each read pair, the R1 read in the file comes immediately before the R2 read, followed by the R1 read for the next read pair, and so on. In the header of the interleaved.fastq I also have some barcode information (BX:ACTGTCAATGTCAACT-1). This would look like this:
@HX6_24184:8:2115:12337:28031 BX:Z:CGAGCACCATCGGTTA-1
TTCATTTTTATCGTTTTCCGTTCCTGTTGTTCAAAGCATCTTTATCTTCCGCACAGCCTCTTTTTAAGCCTATGATATAAGGGTGCGGTAAATTTACTCTCTGCAAGCCTTTCCCTTAGCGGCTGAAGACTGACAAGTCTGTACAGATCAT
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFJ-JJFJJJJJJJJJJAFJJJJJAFJJJJJJJJJJF77
@HX6_24184:8:2115:12337:28031 BX:Z:CGAGCACCATCGGTTA-1
AGGTTTTTTGGGCGTGAACAGGTAATAGTCGTTGTCCTTTTCTTGTTTAAAAATTTCTTTAAGAAAAGTTCTGCTATAATTTCCCAAACCTGTCTTGTTAAAGAAGGTACGTTTGGCTTCATATCCA
+
AFFFFJFJJFJFFJFJFJF-FF<--<JJFJJJJJJ-7FFFJ<JJFJJF-FJJFFFJAFJJJJJJF-AJJJFAJ-7<A-F<FF7FAFJ-A-77<7FFFJJ<<-777<F7--A7<FF7<A<-<-AAF-A
@HX6_24184:8:2109:23196:7462 BX:Z:ACACTGAAGAGACGAA-1
AGTTTTTTTATCGGTAGATAAAAAAACTTCACTCAACGATGCGTTGCGCACACATAATGTGGCGGTTTAGAACTTATTGCGCTTTTTATGAGTCAACTTTCCGGTTATAAAATTGGATATGAAGCCAAACGTACCTTCTTTAACAAGAC
+
AAFFFJJJJJJFJJFJJAJFJ<JFFJFFJJAFFJJFJJJFJJJJFJ<FJJJFAFJFJJFFFJJJJ-AAFFJJFFJFFAFFJ<FJJAFJJJJJJ7AFJFFJJ<FAAJ7JFJJFAFAJ<FJJJF<FAAFJJJ<JJJF-AAJJJJA<7-<F<
@HX6_24184:8:2109:23196:7462 BX:Z:ACACTGAAGAGACGAA-1
TTGTTTTTTTGTCGGAGTTACTACTATTGCAAAAATAGCAGATAGTGCCTATAGATATACAAATAATAGTAATTCAAGATATGGGTTTGTTGACATAATATTACAACTTGTATCACAGACAAAAGATT
+
FJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJFJJAJFJJJJJJJJJJJJJJJJJJJJJJJJJJJ
What I would like to do is to sort this interleaved.fastq file by the barcode BX:Z:... But I want to maintain the interleave format: forward read-reverse read. Thus I want an interleaved_barcode_sorted.fastq file!
I tried this:
cat interleaved.fastq | paste - - - - | sort -k2,2 -t " " | tr "\t" "\n" > interleaved_barcode_sorted.fastq
It works partially, because it sorts correctly by the barcode but the read order is not forward-reverse, thus it's no longer an interleave fastq file.
Any ideas?
Thanks!
Amazing, now it works! Thank you so much! :)