Question

extract unique sequences on bam file

0

Entering edit mode

5.4 years ago

SeaStar ▴ 50

I ve aligned a list of sequence to a transcript to observe if this sequences are present in the transcript. I produced a BAM file. This is the result of my Alignement.

Left reads:
          Input     :  27849344
           Mapped   :     16442 ( 0.1% of input)
            of these:      7259 (44.1%) have multiple alignments (56 have >100)
Right reads:
          Input     :  27849344
           Mapped   :     15211 ( 0.1% of input)
            of these:      8390 (55.2%) have multiple alignments (56 have >100)
 0.1% overall read mapping rate.

Aligned pairs:      5975
     of these:      3776 (63.2%) have multiple alignments
                     194 ( 3.2%) are discordant alignments
 0.0% concordant pair alignment rate.
~

Now I have to see that I don't count the same read twice, because I have changed the multiple alignment parameters. If a read maps on two different sequences, it is always the same read. I have to count the reads as unique. So how can I made this check?

sequence bam r align • 2.1k views

ADD COMMENT • link 5.4 years ago by SeaStar ▴ 50

4

Entering edit mode

Now I have to see that I don't count the same read twice

Before doing that, I would worry about the 0.1% overall read mapping rate.... Do you think it is normal to have such a low alignment rate ?

ADD REPLY • link 5.4 years ago by Carlo Yague 8.9k

1

Entering edit mode

I ve aligned a list of sequence to a transcript

This is likely because full dataset is being aligned to a transcript. Not generally recommended but probable explanation for that 0.1% figure in this case.

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

I'VE aligned a set of transposon to a transcript to observe if these transposable elements are presents. It is a experimental test. I've used tophat

ADD REPLY • link 5.4 years ago by SeaStar ▴ 50

0

Entering edit mode

What exactly are you trying to show? Is the following interpretation correct.

Threre are reads that align to the transcript of interest.
In case where 1 is true the read aligns only once?

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

Yes, it is correct. I want to check if the reads don't align 2 or more times the same sequence

ADD REPLY • link 5.4 years ago by SeaStar ▴ 50

0

Entering edit mode

Depends on the alignment tool you used and whether or it includes such tags.

Also please use the search function.

ADD REPLY • link 5.4 years ago by benformatics 4.0k

0

Entering edit mode

I'VE used tophat. I would make this analysis on R

ADD REPLY • link 5.4 years ago by SeaStar ▴ 50

0

Entering edit mode

I'VE used tophat. I would make this analysis on R

There is no logical overlap between those two. One is a splice-aware alignment program and other is a statistical programming environment.

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

Sorry, I would to analize the bam file produced with tophat in r environment

ADD REPLY • link 5.4 years ago by SeaStar ▴ 50

0

Entering edit mode

What kind of analysis?

ADD REPLY • link 5.4 years ago by GenoMax 147k

0

Entering edit mode

A kind of analysis that allows me to count the repetitive elements in the trascripts. Anyway I ve tried a solution. I opened the bam with samtools, and I made unique() to count the unique reads that drop in some elements of interest. For example I made unique to all the reads presents in SINEs

ADD REPLY • link 5.4 years ago by SeaStar ▴ 50

score 1 · Answer 1 · 2019-07-17

I assume you used tophat2 for this alignment. Refer to this post which explains mapping qualities for tophat2: Tophat2 Mapping Qualities

I suggest that you try to see how many reads show up with following commands (you are just getting counts):

$ samtools view -c -q 30 your.bam
$ samtools view -c -q 40 your.bam
$ samtools view -c -q 45 your.bam
$ samtools view -c -q 50 your.bam

That last command will likely show reads that are uniquely mapped. You can remove -c if you want to capture those to a file.