How to deal with asterisk in bam file after alignment with STAR
1
0
Entering edit mode
2.6 years ago
brgs • 0

I have aligned pair-end RNA clip-seq data to human genome, where the output bam file contains some reads like:

AAAAAAAAAC:HWI-D00611:153:C6PBEANXX:5:1309:5483:77560   89  NC_000001.11    51781228    255 38M *   0   0   TTTCATGCGGGAAGGAAAGGATCAGTTGCCAAAAAGCC  <<//BBF<BBFFFFBFFFBFFFFBFBBFF<FFFFFF<F  NH:i:1  HI:i:1  AS:i:37 nM:i:0

I am wondering what the * means in the sequence? As most of the reads have = at that position, and when there is a = there are always two reads with the same head, but when there is a * there is only one read with that head.

(what I mean by head is this part: AAAAAAAAAC:HWI-D00611:153:C6PBEANXX:5:1309:5483:77560)

And I also want to know how to filter how reads with * by using samtools or other tools. Thanks a lot for helping me get out of there.

samtools bam alignment STAR • 1.6k views
ADD COMMENT
2
Entering edit mode
2.6 years ago

I believe that the * indicates that the RNEXT (Reference sequence name of the primary alignment of the NEXT read in the template) is not available. Basically, it means that the other read in the pair is not mapped.

You can filter these alignments with:

0x8     8  MUNMAP         next segment in the template unmapped
ADD COMMENT
0
Entering edit mode

Thanks! It works!

ADD REPLY
0
Entering edit mode

Sorry, again I would like to ask why this can happen? I searched online but haven't found a good explanation why one of the pair-end read can't be mapped to the genome.

ADD REPLY
0
Entering edit mode

a simple and biologically relevant explanation would be a contaminant, take any organism that shares some similarity with your reference, a fragment that originates in a similar region but ends in a dissimilar region will have a broken pair

you could also have some fusions in the sequence, the fused sequences produce fragments that don't quite exists in the reference

at the same time you could also have other weird things happening, more on the measurement or sequencing error side, one of the pairs being deteriorated

ADD REPLY

Login before adding your answer.

Traffic: 2026 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6