Question

Find mate reads in two different pair-end reads files.

0

Entering edit mode

2.1 years ago

artist01 • 0

Hey guys,

I am quite new to this field and recently I have been asked for an exercise in which I get two different files from an NGS machine corresponding to the forward and the reverse sequence into a fastq format. The exercise consists of finding the mate reads in both files.

Should I just match the whole name on the different files using the sequence identifier, i.e, @SEQ_ID? Or is it a faster way to do it in which I do not need to match the complete name and only the run ID possibly (even if I do not know how the sequence ID is going to be given)?

NGS • 855 views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 2.1 years ago by artist01 • 0

score 1 · Answer 1 · 2022-10-30

The mate/paired reads can simply be identified by order. E.g., the 3rd read in the R1 file and 3rd read in the R2 file are mates, same for the 1000th, 15347th read etc. No need to match the read IDs. There is some convention that mates are named readid123/1 and readid123/2, but it is not always the case.