Rattus norvegicus Rnor5.0 GTF annotation in GENCODE format ?
2
1
Entering edit mode
10.3 years ago

Dear BioStars,

I am using a piece of software (PARalyzer for PAR-clip data). The pipeline I am using requires a GENCODE format .gtf annotation. Since the official GENCODE rat annotation is not available, maybe somebody tried re-structuring the UCSC annotation to fit the GENCODE format ?

I already tried appending a second "filler" column to match the column order, but there are way more differences in the last multi-entry column. The home-made annotation with filler column didn't work (all works perfectly for GENCODE human and mouse annotation).

I would be very thankful for any tips or maybe files ;)

Have a nice weekend!

Piotr

gencode annotation rat • 5.3k views
ADD COMMENT
1
Entering edit mode
10.3 years ago

I tried using and playing around with Ensembl, UCSC and NCBI .gtf's with no success.

Few of the important differences are:

  1. Names of chromosomes (GENCODE uses "chr1" instead of "1" or "chrM" instead of "MT"))
  2. The second column in GENCODE format is the source of the annotation (ENSEMBL/HAVANA)
  3. The 9th column (with key-value pairs) is quite different as well (e.g. non-GENCODE gtf doesn't contain "level" information).

It is possible to re-structure the .gtf from UCSC to such format and make it very similar, but if somebody already has something like this and knows it works then I would much appreciate using such tested .gtf then making my own and testing it while re-inventing the wheel...

Best,
Piotr

ADD COMMENT
0
Entering edit mode

You want Ensembl. We make the GENCODE GTFs (for human and mouse) and we make the GTFs for all our other species in the same style.

ADD REPLY
0
Entering edit mode

They are exactly in the same style ?

I used the newest rat Ensembl GTF, but had to make some minor changes - for example in human GENCODE gtf there is a "gene_type" not "gene_biotype" in the key:value column, exon numbers have no quotes in GENCODE GTF and the chromosome names are different (chr1 in GENCODE vs 1 in ENSEMBL).

I ended up writing a Python script in the end for making those minor changes. All seems to work now.

Best regards,
Piotr

ADD REPLY
0
Entering edit mode

Is this script available on GitHub? I am running into the same problem, and I also have the same problem of not wanting to spend time re-inventing the wheel. My pipeline works perfectly if I use GENCODE GTFs, but does not work if I use the Ensembl GTF for rat.

ADD REPLY
0
Entering edit mode
10.3 years ago
Bert Overduin ★ 3.7k

I have no idea what the difference between the GENCODE GTF format and regular GTF format exactly is, but have you had a look at the Ensembl GTF file for rat?

ADD COMMENT

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6