I am going to run TopHat to map my rna-seq reads to human genome and I found that one of the option is to provide TopHat with the annotation GTF file -G/--GTF <GTF/GFF3 file>
and I found the following note in tophat manual
Please note that the values in the first column of the provided GTF/GFF file (column which indicates the chromosome or contig on which the feature is located), must match the name of the reference sequence in the Bowtie index you are using with TopHat. You can get a list of the sequence names in a Bowtie index by typing:
bowtie-inspect --names your_index
So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.
So I checked the the first column in my GTF , but my question is how to use bowtie-inspect
to check the index file (I mean which file should I use to check that)
Hi Pandey, thanks for helping me but I need some help for this point and I did the following steps:
I installed the human annotation GTF file from ensembl website and I checked the first column in the file and I found that the names of the chromosomes are 1,2,..., X,Y. not chr1, chr2,......etc.
I downloaded the index file from Bowtie website (I mean I didn't create the index by my self)
My question is how to check the index file to make sure the chromosomes have the sae names 1,2,.....
You need to use bowtie-inspect which comes with bowtie. See this link how to use it.
Basically use bowtie-inspect from command line and tell it where the index is located plus the suffix that is coomon to all the index files.
I used the following command to check the names of the index file, and I got the names without chr as shown below
Also, I checked the annotation file and I found the names also without chr ,but the only difference is the order of chromosomes as shown below
My question here is the different in order cause any problem or no (I mean is it okay to use both of them with different order)
Just try and check. The most harm it will do is throw an error that order don't match or it may work regardless of the order of chromosomes in those two files. Dont be scared that you will mess up something. Always have a backup for your files.