Hi, I am new at using tophat2, in order to do that I am using a paper as a reference. In the paper they give the options they used [-r 25 --coverage-search -G --library-type fr-firststrand ], but I am getting an error. So this is the command line I use:
$ tophat -r 25 --coverage-search -G --library-type fr-firststrand /my_index directory/bowtie2/mm9 mysample.fastq &> tophat.log
In my log file it says:
Error: cannot find transcript file --library-type
I assume the error could be due to using -G but not providing an annotation, however in the paper they didn't provide anything other than they used mm9 genome for mapping. I have read tophat manual but couldn't figure out the reason of my error. Can anybody help me on this? Thanks!!
The -G option is used to created a "transcriptome" specific index from a whole genome index by providing a GTF/GFF file like this. This is a one time run. It allows one to re-use this index for subsequent runs for all samples for aligning to just that part of the genome.
When you actually use this index you need to provide the location for it by using
Thank you very much! So can I use any gtf file with known genes regardless of the project ? For example, http://useast.ensembl.org/info/data/ftp/index.html from here, can I use mouse GTF gene sets, after -G option?
If you are trying to replicate the analysis in a paper then make sure you get it from the same location/for the same genome build. Otherwise your results would be different from what is in the paper.
Thank you very much for the help. One last thing, in general can I use Ref-Seq for annotation, or it has to be a specific location o a chromosome?
If I understand the question correctly
RefSeq annotations would be stand alone (though the accession numbers may be included in the GTF file you will use). So if you want to correlate gene names with Refseq ID's you should be able to do that.
Thank you very much!