Entering edit mode
9.6 years ago
Endre Bakken Stovner
▴
970
I have a bam file for a chromosome sorted by read names. For some mate pairs I get output like this (cropped):
42629192 42629262 2PJ3LS1:183:C5RR7ACXX:4:1101:21144:7110/2 43729562 43729595 2PJ3LS1:183:C5RR7ACXX:4:1101:21144:7110/1 78061166 78061267 2PJ3LS1:183:C5RR7ACXX:4:1101:21144:7110/2
I guess none of these should really be considered to be in a pair since they are so far apart. But this leads to the question, how do you know that two reads belong together in a pair? Is this metadata in the bam file somewhere?
If the three reads above aligned much closer together, it would be hard to tell which two made up the pair and which was the odd man out, right?
even cropped it doesn't look like a BAM: read name should be the 1 st column.
I know, it is a bam-file that has been converted to bed then processed (mangled?) in python. I hope the question is still understandable and that the problem I describe is still valid though; how do I know which reads belong together or not. But thanks for pointing it out so people reading this q do not misunderstand.
In last "/1" is first mate and "/2" is second mate
but how does it matters when you converted them into bed?
It matters since I want to find the region covered by each matepair.
okay then in sequence ID column when suffix "/1" is first mate and "/2" is second mate