Question

Aligning Reads To A Reference Transcriptome

2

Entering edit mode

11.1 years ago

Prakki Rama ★ 2.7k

Hi all,

Could i please know if we can align reads to a Reference Transcriptome instead of Reference genome and assemble a transcriptome using tophat/cufflinks? Any potential advantages/disadvantages by doing so? Any ideas using BWA ( a non-spliced aligner) for this task?

Please spare me, if i could not put properly. Thanks in advance to your suggestions.

transcriptome cufflinks bwa • 17k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 11.1 years ago by Prakki Rama ★ 2.7k

9

Entering edit mode

For Tophat, check manual page: http://tophat.cbcb.umd.edu/manual.shtml

-G/--GTF <gtf gff3="" file="">

Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.

-T/--transcriptome-only

Only align the reads to the transcriptome and report only those mappings as genomic mappings.

So you can choose whether to align to genome only, transcriptome + genome or transcriptome only.

There are some other options in Tophat connected to transcriptome mapping, I recommend to check them too.

ADD REPLY • link 11.1 years ago by jockbanan ▴ 440

1

Entering edit mode

you might wanna post this as an answer...

ADD REPLY • link 11.1 years ago by Phil S. ▴ 700

0

Entering edit mode

Thank you jockbanan. I would consider it.

ADD REPLY • link 11.0 years ago by Prakki Rama ★ 2.7k

Ram · Answer 1 · 2015-02-25

Hello everyone

From this post I got to learn important things. As I have few query in my mind, that I want to discuss here.

I am working on RNAseq analysis using tophat and worked on default parameters for mapping provided -G GTF file. As for my species, no reference available (Gossypium hirsutum) , therefore I picked the closely related species i.e Gossypium arboreum. Multimapped reads percentage is bit high and uniquely mapped reads are less. As this cotton is polyploid species, Therefore I can't discard the multi-mapped reads. I end up with poor results , my working command is as follows

python tophat.py -p 8 -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

Now I am working on another strategy where I want to map to gene models rather than mapping against whole reference genome. Providing the -T (transcriptome only) will do mapping against the gene models only or it is other than this? For transcriptome mapping, command should be ...

python tophat.py -p 8 -T -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

Please correct me if I am wrong anywhere

waiting for reply

Thank you in advance

Ram · Answer 2 · 2015-02-26

-T/--transcriptome-only    Only align the reads to the transcriptome and report only those mappings as genomic mappings.

Yes you are right , there will be one copy of chromosome in fasta file. But reason behind not filtering out the multimapped reads against genome is numerous repeats ( extremely high) within it.

Under tophat manual it is given that providing GTF file leads for the --transcriptome-index (here transcriptome means gene provided in GTF file? Am I right? Or it is other than this?)

Tophat Mapping without -T

python tophat.py -p 8 -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

and with -T,

python tophat.py -p 8 -T -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1  1_fastq_2

I am getting difference in FPKM values. Why is it so?

How running tophat with first command differ from the second one?