Dear All,
I have a BAM file of paired-end sequencing reads. I want to calculate how many paired-end reads that mapped to the same chromosomes their middle size (the number of base pairs) between the two ends of paired-end reads is greater than 100 kb, and how many paired-end reads that mapped to the different chromosomes their middle size (the number of base pairs) between the two ends of paired-end reads is greater than 100 kb. Can anybody help me get this done? Thank you very much in advance.
Thank you so much for your help. I will try your script.
How can I calculate the numbers for all of chromosomes efficiently? and how about that mapped to different chromosomes? Thank you again.
You can cut the chromosome field out from the distances.txt file and then count how often each chromosome identifier appears. For example:
You can easily discover the number of reads whose mate mapped to a different chromosome with