Dear all, Currently, I have two files of aligned data in bed format, and each file contain one part of paired-end sequencing results,let's say s11.bed,s22.bed. s11.bed and s22.bed contain different number of reads,some of which can be paired and some not. Usually, I will first combine the two files,then sort the combined file(by linux sort command), and then search one line by one line. Because the paired-reads id is near to each other in the sorted file(such as DBV2SVN164712082000517676201, DBV2SVN164712082000517676202),I can just record previous line id to do the pairing. However, when the alignment files grow larger and larger, sorting can be quite time and memory cosuming. Does anybody here have other good ways to do such kind of stuff? Thanks in advance~
By the way, the genome is yeast genome, so even I divide the file into subfiles according to chromosome information, the size of each subfile still very large.