Question

Difference between Gene Models (Annotation files .gtf) for human (hg19)

0

Entering edit mode

10.1 years ago

M K ▴ 660

Hi All,

I am working in RNA seq analysis and I am going to use the gene model (annotation file .gtf) for human (hg19). I found different releases for that like GRCh37.55, GRCh37.61, ...........GRCh37.75,.........

When I looked inside each release, I found that the order of chromosome on them not the same (i.e some start with chr1 then chr 10, chr11, chr12, ......, and chrY. and other one start with GL000213.1, then HSCHR21_2_CTG1_1, then chr18,..chr 10)

My question is there any difference of use them with the same order on them, and Why/what these differences between them (i.e why they change the chromosome order in each release)

RNA-Seq genome • 4.6k views

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by M K ▴ 660

0

Entering edit mode

Which annotation tool are you using? You shouldn't worry about the order unless the annotation tool you are using is picky about it. Tools like GATK, Tuxedo suit (RNAseq tools) are picky about the order of chromosomes in bam and gtf file (for good reasons). I have no idea why they change the chromosome order in different releases.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Hi Pandey, Thanks for your comment.

I am not using any tool to create the annotation files, I just downloaded different releases from ensembl website and I am going to use one of them for the parameter -G in Tophat.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by M K ▴ 660

0

Entering edit mode

I am sure you may have come across this on TopHat manual page. Just follow what they say and you will be fine:

-G/--GTF <GTF/GFF3 file>

Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.

Please note that the values in the first column of the provided GTF/GFF file (column which indicates the chromosome or contig on which the feature is located), must match the name of the reference sequence in the Bowtie index you are using with TopHat. You can get a list of the sequence names in a Bowtie index by typing:

bowtie-inspect --names your_index

So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.1 years ago by Ashutosh Pandey 12k