Entering edit mode
7.5 years ago
kirannbishwa01
★
1.6k
I am trying to find the number of reads that align uniquely, to two, to three and multilocations in a bam file.
This bam was create using PE reads and below is two lines from the SAM file:
HWI-D00123R2:155:C4AM3ACXX:2:1103:4696:31199 403 AL2G13910.t1 5619 60 100M = 5586 -133 GAGATAGTGGAATCATCCGAACCCTGGATGTCCCGATCTATATCACCAAGGTGTCTGGTAATACAATCTTCTGCTTGGATCGGGATGGGAAAAACAAGGC DDDDCDDEDDEEDDDDDDDAADDDEEEEDFFHHHJJIJJJIIGF=JIJIHJJIIIJJJJJJIIJJJJJJJJJIJJJJJJJJJJJJJJHHHHHFFFFFCCC NH:i:1 HI:i:1
HWI-D00123R2:155:C4AM3ACXX:2:1103:4698:74520 419 AL2G30660.t1 2065 60 88M = 2141 176 CTGAACCTGCCATCGCTGGAAACGTATCTGCTGCCTCACCAGTTGATGACAAGAACGATGATGGAGATGAACATCACGAGATCGATCT GJJJIJJJJJJJJJJJJJJJJJJJHIJJJJJJJJJJJIHHIJJJJJJJJJJJJJJHHHFFFFFEEEEEDDDDDDDDDDDDDDDDDDDD NH:i:1 HI:i:1
To count unique alignments I used something like:
grep NH:i transcripts_test.sam | wc -l
Well, NH:i:...
is the tag suggest to how many locations was the read aligned to. But, since the bam is based on PE reads should that count be divided by two? If so the counts won't be right when only one mate is uniquely mapped, rite?.
And, is there a tool that I can use to prepare me alist of number of reads for each level of alignments?
Thanks,