I am using paired end SRA data, it looks like this
@HWI-ST915_0064:2:1101:1420:2104/1
GTCTCTTCGCACGCTTTCACTGTGAACGGTTCGGCATCGAGAAGGACGCAGTTCCTCTCCGGCTTGGACCAGTTTCTGGTGGCCACGGCTGCCCCCATCC
+SRR1188607.1 HWI-ST915_0064:2:1101:1420:2104 length=100
HDHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHFHFFHHHHHHHHEHEHHHHHHHGHHHED?EE=A@BACDDECCE@DB74?############
@HWI-ST915_0064:2:1101:1498:2108/1
AAAGATTGCAATGGAGGAGAAAGGGAAGACCCTGCCTGAAGAAATGCAATTGATAAATAAGTTGTTGTCTGAGGAAAAGGGTTCGGAGAGGATGAGAATG
+SRR1188607.2 HWI-ST915_0064:2:1101:1498:2108 length=100
GFHHHHHHHHHHHHHHHHHHHHHGHGHHHHHHHHHHGHHHHGHHHHHHDHGGHHFHHHHHGEGGFGGFGFHFHHDHHGHHGFGFGHGFHHHHHHHHHHHH
Othe read file is similar except with /2 at the end
For further analysis I need to match the description in line one with line three after +, but in line three there is extra information as SRR1188607. 1, SRR1188607.2 etc how to get rid of this so that it matches with line one description. Also how to delete everything after +
Thanks
I wonder why you need to do this... most programs ignore the 3rd line...
For example, the biopython parser throws an error if the third line is present and not equal to the first line. AFAIK per convention the third line has to be equal to the first, or absent.