I am trying to reproduce some results from a paper, in which smallRNA sequencing reads are mapped to the transcriptome using bowtie. Since there is no code, these are the steps I did:
- Created a Transcriptome reference index with
tophat -G Danio_rerio.gtf --transcriptome-index=TophatBowtieIndexTranscriptome/chr --bowtie1 BowtieIndexChr/chr
; - Used the created index to map using bowtie.
Looking at the mapped results something seemingly strange is going on - each gene in the GTF appears to become a "contig" for the mapping:
samtools view -H input.bam | head
@HD VN:1.0 SO:coordinate @SQ SN:0 LN:282 @SQ SN:1 LN:2870 @SQ SN:2 LN:2514 @SQ SN:3 LN:1809 @SQ SN:4 LN:1702 @SQ SN:5 LN:1590 @SQ SN:6 LN:1690 @SQ SN:7 LN:907 @SQ SN:8 LN:708
And the alignments look like this:
samtools view input.bam | head
HWI-ST558:373:C8897ACXX:2:1304:8723:58897 16 22 909 255 43M * 0 0 CCACTGAGAATGAGGTTCCAGTAGGCCACCGCCATCTCCAGAT JIHJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJIHHHHHFFFF XA:i:0 MD:Z:43 NM:i:0 HWI-ST558:373:C8897ACXX:2:1307:8956:47009 0 35 1956 255 23M * 0 0 AGACTGTTAGTTTCGAGAAAGAT FFFDHHDHHIFHGIIIJIIJIJJ XA:i:0 MD:Z:23 NM:i:0 HWI-ST558:373:C8897ACXX:3:1212:14852:61943 0 35 3267 255 22M * 0 0 AGATGATGATTGAGACTCAGGC FFFFHHHHHJJJJJJJJJJJJG XA:i:0 MD:Z:22 NM:i:0
Is this is the expected result, or was the index not created appropriately? As someone else done anything similar?
Cheers. Later, I dug up a little more in the generated index and also reached that conclusion.