Dear all,
I've paired-end reads generated by ABI-Solid system 4. I've two fastq files R1.fastq and R2.fastq. I've looked at the content of the two files and I found that the reads didn't match in names (header) as follows which generates some issues for the analysis (for example when trimming the reads using cutadapt).
R1.fastq:
@SRR3159522.1 2_33_78 length=50
GGGATCAAAGGTGCCTAAGAAAGTTCTCACTAAGGGNATCTTCTACGCC
+SRR3159522.1 2_33_78 length=50
CCCDFFFFHHHHHJJJGJJJJJJJJIIGIIIIIJJJ#1?CGHDHHGIJI
@SRR3159522.2 2_36_51 length=50
CTGGTGCGAAAAGGTGAAATAAAAAAGAAGAACGAAGAAGCCGGTGCCA
+SRR3159522.2 2_36_51 length=50
BBCFDFFFHHHHHJGHHIJIJJJJJJIGIIJJIIIJJIGGIJJJHIHHH
@SRR3159522.3 2_36_551 length=50
CCACACCGGGTAAGCTGGTTTGGCGATGCGGGATGATCCGAACGTGGAG
...
...
R2.fastq
@SRR3159522.27470956 2_33_78 length=35
TGTTTNNNNNNNNNNNNAAATGCCAGATCCACAA
+SRR3159522.27470956 2_33_78 length=35
BCBFF############23AGHHHIJJIHIJJJJ
@SRR3159522.27470957 2_36_51 length=35
GTATGCTCCGTNANAGTCTACCAGCACTGACCAG
+SRR3159522.27470957 2_36_51 length=35
BB@FFFFFHHH#2#3AEHIJJIIJJIJJJJJIJJ
@SRR3159522.27470958 2_36_551 length=35
GTCCTGNTNNNNNNNTGAACCAACACCTTTTGTG
...
...
As you can see the headers of the reads are different and don't match each other.
When I used cutadapt to trim the reads, I got a name matching error. I've tried to replace the headers of R2.fastq with the headers of R1.fastq to get the same headers and get rid of the issue but I don't know how to do it. I want to transform R2.fastq as follows:
@SRR3159522.1 2_33_78 length=35
TGTTTNNNNNNNNNNNNAAATGCCAGATCCACAA
+SRR3159522.1 2_33_78 length=35
BCBFF############23AGHHHIJJIHIJJJJ
@SRR3159522.2 2_36_51 length=35
GTATGCTCCGTNANAGTCTACCAGCACTGACCAG
+SRR3159522.2 2_36_51 length=35
BB@FFFFFHHH#2#3AEHIJJIIJJIJJJJJIJJ
@SRR3159522.3 2_36_551 length=35
GTCCTGNTNNNNNNNTGAACCAACACCTTTTGTG
...
...
Someone can help me?
The plus had to either match the name exactly, or be empty.
Thank you rpolicastro, it works fine