Question

how to optimize Tophat for targeted RNA data analysis

0

Entering edit mode

7.3 years ago

genya35 ▴ 50

Hello,

Could someone please suggest optimum TopHat parameters to analyze Ion Torrent targeted RNA data. I need to identify the breakpoints and also unaligned reads in IGV. I plan to run TopHat through Galaxy to test it out, before installing it on the server.

Any suggestions would be greatly appreciated.

Thanks

RNA-Seq • 1.8k views

ADD COMMENT • link updated 7.3 years ago by Kevin Blighe 88k • written 7.3 years ago by genya35 ▴ 50

1

Entering edit mode

Please stop using TopHat. Even more so with Ion Torrent data.

Quote from TopHat web site:

Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.

There are much better solutions out there (HISAT2, STAR and many other splice-aware aligners).

ADD REPLY • link 7.3 years ago by GenoMax 148k

0

Entering edit mode

Could you please suggest parameters to use to optimize alignment?

ADD REPLY • link 7.3 years ago by genya35 ▴ 50

0

Entering edit mode

Hi, why don't you take a look at these:

If you must use TopHat out of curiosity, then just provide it with good data, i.e., reads >50bp and that have base-qualities >30 at the red ends. Start with the quality threshold bars high and then tailour back if needed.

As genomax mentioned, TopHat is effectively retired, and it has been replaced by HiSAT.

ADD REPLY • link 7.3 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin,

Sorry for a basic question but I just ran my fastQ file through HISAT using default parameters and here are the stats:

345063 reads; of these:
345063 (100.00%) were unpaired; of these:
35597 (10.32%) aligned 0 times
260962 (75.63%) aligned exactly 1 time
48504 (14.06%) aligned >1 times
89.68% overall alignment rate

It produced a .bam (19,219 K) and an index .bai. I've imported the file into IGV using the link but I don't see anything. Where are the reads? Thanks for your help.

ADD REPLY • link updated 7.3 years ago by GenoMax 148k • written 7.3 years ago by genya35 ▴ 50

1

Entering edit mode

You have to zoom in significantly before you start seeing the reads. You have a very small amount of reads (for an RNAseq dataset) so you can either move around the genome in IGV till you find the reads (or pick a gene you know should be represented) and then go to that region directly.

ADD REPLY • link 7.3 years ago by GenoMax 148k

0

Entering edit mode

Thank you so much, I see them now!. Is there a way to optimize the alignment? Thanks

ADD REPLY • link 7.3 years ago by genya35 ▴ 50

0

Entering edit mode

What does optimize mean? How could it be improved?

ADD REPLY • link 7.3 years ago by WouterDeCoster 47k

0

Entering edit mode

Last, I used RNA STAR to align the fastq and the output is looking good. I was able to create an aligned bam file and I can see soft clip bases in IGV. However, how do I see the full length fusion reads that were not mapped? I can see an evidence of fusion but would really like to see the unmapped reads. Could you please suggest how to accomplish this? Thank you.

ADD REPLY • link 7.3 years ago by genya35 ▴ 50

0

Entering edit mode

That's a completely different question than the one you started with. A separate thread would be appropriate. Don't forget to be as informative as possible and include all necessary information in your post.

ADD REPLY • link 7.3 years ago by WouterDeCoster 47k