Find mate reads in two different pair-end reads files.
1
0
Entering edit mode
2.1 years ago
artist01 • 0

Hey guys,

I am quite new to this field and recently I have been asked for an exercise in which I get two different files from an NGS machine corresponding to the forward and the reverse sequence into a fastq format. The exercise consists of finding the mate reads in both files.

Should I just match the whole name on the different files using the sequence identifier, i.e, @SEQ_ID? Or is it a faster way to do it in which I do not need to match the complete name and only the run ID possibly (even if I do not know how the sequence ID is going to be given)?

NGS • 855 views
ADD COMMENT
1
Entering edit mode
2.1 years ago
liorglic ★ 1.5k

The mate/paired reads can simply be identified by order. E.g., the 3rd read in the R1 file and 3rd read in the R2 file are mates, same for the 1000th, 15347th read etc. No need to match the read IDs. There is some convention that mates are named readid123/1 and readid123/2, but it is not always the case.

ADD COMMENT

Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6