Hello,
I am having trouble finding softwares that can remove duplicate reads from the aligned files (single-end sam/bam or bed files). I intend to keep the top "n" aligned reads (based on mapping quality) if there are more than "n" reads aligned to the same position. I tried picard but it marks any read as duplicate if there are two or more reads aligned to the same position and does not seem to have an option to provide "n". Is there any other software that I can use to accomplish this? I would like to keep n=5.
Thanks a lot!
Thanks for the suggestion. As I do not have any Java skills I took the option Istvan Albert has suggested below. So the way I did is to scan through the sam file, and write out the list of the names of the reads that are aligned more than N times to the same position. Then I used the FilterSamReads (picard tools) to remove those reads out. I know it is not elegant but it does the job and pretty quickly as well.