Hi, I have a large paired-end dataset in the BAM format and a list of read IDs which belong to a single mate of a pair. What I want to do is to extract their second mates from the whole dataset. Could you please advise me some efficient ways to do this like using, let's say, Bio-SamTools or something like that? Something memory- and time-efficient. Thanks!
how many reads-IDs do you have, does it fit in memory ?
Just now it's only about 200. But for other datasets it could be much more, 10^3-10^5.