I wanted to extract reads from a fastq format file for the reads that the 1-8 nt with a format of "NNNNNGGG" and saved as a fastq file as well for further alignment. for example for the three reads below:
@SRR8105603.9 NS500418:833:HNY2CBGX5:1:11101:10498:1122 length=76
GCAGGGGGACCCCATCTCTACTAAAAATACAAAAATTAGACAGACGTGATGGGGCATTTCTCTAATCCCAGCTACT
+SRR8105603.9 NS500418:833:HNY2CBGX5:1:11101:10498:1122 length=76
AAAAA//EEEEEEE/EEEA/EEEE/E/E6EEAEAE/6EE/E///A/AA</A<</6<A<6EE/A6AEEEE//E6AE/
@SRR8105603.10 NS500418:833:HNY2CBGX5:1:11101:23704:1138 length=75
TGCTCGCGGATCGCTTGAGTCCAGGAGTTCAAGACCAGCCTGGGTAACATGGCAAAACCTCATCTCTACAAAAAA
+SRR8105603.10 NS500418:833:HNY2CBGX5:1:11101:23704:1138 length=75
AAAAAA//EAEAEAEE/AAEEEAEEEEAEEEAEEAEEEA<A<</EA/E6//<EAEE<AAE<EEAAEEEE///</<
@SRR8105603.11 NS500418:833:HNY2CBGX5:1:11101:26002:1139 length=76
TGCGAGGGGCAAGTTCCTGTTTCCAAACAACAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGGGAGGG
+SRR8105603.11 NS500418:833:HNY2CBGX5:1:11101:26002:1139 length=76
The 1st and 3rd reads the 1-8 nt meet the "NNNNNGGG" format (GCAGGGGG and TGCGAGGG respectively) so the extracted would be:
@SRR8105603.9 NS500418:833:HNY2CBGX5:1:11101:10498:1122 length=76
GCAGGGGGACCCCATCTCTACTAAAAATACAAAAATTAGACAGACGTGATGGGGCATTTCTCTAATCCCAGCTACT
+SRR8105603.9 NS500418:833:HNY2CBGX5:1:11101:10498:1122 length=76
AAAAA//EEEEEEE/EEEA/EEEE/E/E6EEAEAE/6EE/E///A/AA</A<</6<A<6EE/A6AEEEE//E6AE/
@SRR8105603.11 NS500418:833:HNY2CBGX5:1:11101:26002:1139 length=76
TGCGAGGGGCAAGTTCCTGTTTCCAAACAACAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGGGAGGG
+SRR8105603.11 NS500418:833:HNY2CBGX5:1:11101:26002:1139 length=76
it sounds like the real issue is that the files did not demultiplex correctly with
bcl2fastq
?