Dear All,
I have 20x illumina sequences data in large fastq files. Each file contains a sequence length of 21 nucleotides. I would like to remove the first 4 nucleotides from all reads in the files.
i.e.
@D5N3XBQ1:129:C0T9LACXX:1:1101:1227:2122 1:N:0:
CATGATTTGATATTTAGGGCTT
+
HIFHIEGHIIFHGIIGHIIIDH
@D5N3XBQ1:129:C0T9LACXX:1:1101:1150:2163 1:N:0:
CATGATGACATAGAAATAATTT
+
IIFIIIIIIIIIIIFIFIIIFI
@D5N3XBQ1:129:C0T9LACXX:1:1101:1155:2248 1:N:0:
CATGAAGACAAAGCCTCTATGA
to
@D5N3XBQ1:129:C0T9LACXX:1:1101:1227:2122 1:N:0:
ATTTGATATTTAGGGCTT
+
HIFHIEGHIIFHGIIGHIIIDH
@D5N3XBQ1:129:C0T9LACXX:1:1101:1150:2163 1:N:0:
ATGACATAGAAATAATTT
+
IIFIIIIIIIIIIIFIFIIIFI
@D5N3XBQ1:129:C0T9LACXX:1:1101:1155:2248 1:N:0:
AAGACAAAGCCTCTATGA
I am new to bioinformatics and would appreciate a few pointers on the best way to get this done with the command line in Linux. Thanks, Lisanne
Edited my answer; presumably, you also want to remove the first 4 characters of quality score?