Tophat-Fusion-Post On A Non-Well Annotated Genome
2
1
Entering edit mode
12.1 years ago

Hello,

I'm currently using tophat 2.04 with --fusion-search to discover fusion transcripts in a non-well annotated genome. I created my own annotation with the gene sequences of a related species (which is annotated). So I have the genome fasta file and a gff for the annotation. I ran tophat 2.0.4 on all my samples and I now want to execute tophat-fusion-post. I read tophat's manual ( here ) but I have difficulties understanding how to execute tophat-fusion-post on non-human samples. What about refGene.txt and ensGene.txt? Do I have to create them starting from the annotation? Same question for the blast databases.

Thanks a lot for your help,

N.

tophat • 3.4k views
ADD COMMENT
2
Entering edit mode
12.1 years ago

Well I will just say that the "manual" that they have is woefully inadequate. Plus a disclaimer: I have not used the tool but I just read the paper - your link got me interested in it. So the below is my opinion mainly based on the paper:

The refGene.txt and ensGene.txt files will need to be present and you will need to generate them from your annotations. I think they can contain the same data, originally the tool was designed to combine annotation form RefSeq and Ensembl, so each file is supposed to contain annotations from that resource.

Blast is used to filter out false fusions based on sequence similarly, you will need to download the existing blast database from the links in the manual but then also index your genome if it is not already contained in the nt database.

ADD COMMENT
0
Entering edit mode

Ok thanks ! So in summary : - refGene.txt and ensGene.txt generated from my annotation - blast db of the genome.

and that's it ?

ADD REPLY
0
Entering edit mode

well as always the devil is in the details - see what happens

ADD REPLY
0
Entering edit mode
12.1 years ago

I've an additional little question about ensGene.txt and refGene.txt. What is the format of these files ?

here's a line from refGene.txt (ensGene.txt is in the same format). For each column, I pass a line.

271
NM_001080475
chr2
-
208394256
208598529
208401287
208574608
8
208394256,208434073,208481482,208503894,208519335,208549619,208573998,208598357,
208401465,208434231,208481546,208504088,208519481,208550555,208574926,208598529,
0
PLEKHM3
cmpl
cmpl
2,0,2,0,1,1,0,-1,

Thanks

ADD COMMENT

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6