Mapping to the transcriptome using Bowtie - am I doing it right?
2
0
Entering edit mode
8.8 years ago
A. Domingues ★ 2.7k

I am trying to reproduce some results from a paper, in which smallRNA sequencing reads are mapped to the transcriptome using bowtie. Since there is no code, these are the steps I did:

  1. Created a Transcriptome reference index with tophat -G Danio_rerio.gtf --transcriptome-index=TophatBowtieIndexTranscriptome/chr --bowtie1 BowtieIndexChr/chr;
  2. Used the created index to map using bowtie.

Looking at the mapped results something seemingly strange is going on - each gene in the GTF appears to become a "contig" for the mapping:

samtools view -H input.bam | head

@HD VN:1.0 SO:coordinate @SQ SN:0 LN:282 @SQ SN:1 LN:2870 @SQ SN:2 LN:2514 @SQ SN:3 LN:1809 @SQ SN:4 LN:1702 @SQ SN:5 LN:1590 @SQ SN:6 LN:1690 @SQ SN:7 LN:907 @SQ SN:8 LN:708

And the alignments look like this:

samtools view input.bam | head

HWI-ST558:373:C8897ACXX:2:1304:8723:58897 16 22 909 255 43M * 0 0 CCACTGAGAATGAGGTTCCAGTAGGCCACCGCCATCTCCAGAT JIHJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJIHHHHHFFFF XA:i:0 MD:Z:43 NM:i:0 HWI-ST558:373:C8897ACXX:2:1307:8956:47009 0 35 1956 255 23M * 0 0 AGACTGTTAGTTTCGAGAAAGAT FFFDHHDHHIFHGIIIJIIJIJJ XA:i:0 MD:Z:23 NM:i:0 HWI-ST558:373:C8897ACXX:3:1212:14852:61943 0 35 3267 255 22M * 0 0 AGATGATGATTGAGACTCAGGC FFFFHHHHHJJJJJJJJJJJJG XA:i:0 MD:Z:22 NM:i:0

Is this is the expected result, or was the index not created appropriately? As someone else done anything similar?

bowtie gtf transcriptome zebrafish RNA-Seq • 5.7k views
ADD COMMENT
3
Entering edit mode
8.8 years ago
michael.ante ★ 3.9k

If you align against the transcriptome, this is the expected outcome. The numbering from 0 to #transcripts comes from the fact, how the fasta file for the transcript-sequences was generated. The fasta header looks somewhat like

>0 ENST00000456328 1+ 11869-12227,12613-12721,13221-14409

Bowtie uses in such a case only the first part of the fasta identifier and discards the rest. So, from the numbering in your alignment, you can map the transcript identifier from the fasta file.

Cheers,

Michael

ADD COMMENT
1
Entering edit mode

Cheers. Later, I dug up a little more in the generated index and also reached that conclusion.

ADD REPLY
2
Entering edit mode
8.8 years ago
mastal511 ★ 2.1k

If you are mapping to the transcriptome, rather than the genome, then I think that is what you would expect.

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6