How to count the number of unique alignment in PE reads? How to account for unmapped mate?
0
0
Entering edit mode
7.5 years ago
kirannbishwa01 ★ 1.6k

I am trying to find the number of reads that align uniquely, to two, to three and multilocations in a bam file.

This bam was create using PE reads and below is two lines from the SAM file:

HWI-D00123R2:155:C4AM3ACXX:2:1103:4696:31199    403 AL2G13910.t1    5619    60  100M    =   5586    -133    GAGATAGTGGAATCATCCGAACCCTGGATGTCCCGATCTATATCACCAAGGTGTCTGGTAATACAATCTTCTGCTTGGATCGGGATGGGAAAAACAAGGC    DDDDCDDEDDEEDDDDDDDAADDDEEEEDFFHHHJJIJJJIIGF=JIJIHJJIIIJJJJJJIIJJJJJJJJJIJJJJJJJJJJJJJJHHHHHFFFFFCCC    NH:i:1  HI:i:1
HWI-D00123R2:155:C4AM3ACXX:2:1103:4698:74520    419 AL2G30660.t1    2065    60  88M =   2141    176 CTGAACCTGCCATCGCTGGAAACGTATCTGCTGCCTCACCAGTTGATGACAAGAACGATGATGGAGATGAACATCACGAGATCGATCT    GJJJIJJJJJJJJJJJJJJJJJJJHIJJJJJJJJJJJIHHIJJJJJJJJJJJJJJHHHFFFFFEEEEEDDDDDDDDDDDDDDDDDDDD    NH:i:1  HI:i:1

To count unique alignments I used something like:

grep NH:i transcripts_test.sam | wc -l

Well, NH:i:... is the tag suggest to how many locations was the read aligned to. But, since the bam is based on PE reads should that count be divided by two? If so the counts won't be right when only one mate is uniquely mapped, rite?.

And, is there a tool that I can use to prepare me alist of number of reads for each level of alignments?

Thanks,

bam count alignment genome • 1.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 1519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6