How To Find Unique Reads?
3
1
Entering edit mode
11.1 years ago
alok.helix ▴ 120

i have am working on a transcriptomics project of a non-model plant and i want to know from you how to count the uniquely mapped reads in sam format or the bam format. I have done the indexing of my file with BWA-mem algo for the paired end illumina data and generated the sam and bam file using samtools. I used the command $ cut -f1 myfile.sam | sort | uniq -u | wc -l but i got zero for all the input file. After closely reading the BWA manual i noticed a TAG of XT which i guess will be used to distinguish unique/repeat/Mate -sw etc. but after using the grep in following command:

$ samtools view bwa.bam | grep "XT:A:U" it yielded zero...

Now i am seriously doubting my commands and getting caught in the vicious circle of various commands including this one:

$ samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam

The samtools manual says NO alignment should be assigned a mapping quality of 255. I request you to look into the above query and throw some light on uniquely mapped reads and its searching?? I request you to mail the explanation or links on alok.helix@gmail.com. Thanking you Alok

rnaseq reads • 16k views
ADD COMMENT
3
Entering edit mode
11.1 years ago

Note that the SAM specification does not require that any of the tags be present.

bwa-mem is a preeliminary release, as far as I know will not fill in the XT tag (or perhaps it won't fill the XA tag, one or both I can't recall which)

use the bwa aln method or a different aligner

ADD COMMENT
3
Entering edit mode
11.1 years ago
KCC ★ 4.1k

I once looked at this problem as well. I recall three ways of identifying a read that was mapped to multiple positions with BWA:

  1. the presence of the characters XT:A:R
  2. the presence of the characters XA:Z:
  3. mapping quality (MAPQ) of 0

Based on my notes, these results all gave similar, but slightly different answers. I have found that people in the bioinformatics community tend to favor option 3.

You can just do simple grep searches on the SAM file for lines with the XT:A:R or XA:Z: present for the first two. You can use samtools view -q1 my.bam to get the last one.

ADD COMMENT
1
Entering edit mode
11.1 years ago

If your input is in sam format check XS:i:<N> it will only be present if the SAM record is for an aligned read and more than one alignment was found for the read. so grep -v XS filename would solve your problem

ADD COMMENT
0
Entering edit mode

Hi I have used the following command

$ samtools view .bam | fgrep "XS:i:0" | wc -l
46,953,719

This definately cannot be uniquley mapped. then I used

$samtools view .bam |fgrep "NM:i:1" | wc -l
5,865,151
$ samtools view .bam | fgrep "XS:i:1" | wc -l
1,862,839

but all are in vain I guess....

ADD REPLY
0
Entering edit mode

kindly read my reply properly "it will only be present if the SAM record is for an aligned read and more than one alignment was found for the read".

ADD REPLY

Login before adding your answer.

Traffic: 2032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6