Create a custom .gtf file with a list of genes
1
0
Entering edit mode
10.3 years ago

Hi Guys

I am new to the RNA Seq world and just starting out with linux. I need to create a custom gene annotation file with a list of genes I am interested in analyzing. How do I do that.

RNA-Seq sequence blast • 18k views
ADD COMMENT
0
Entering edit mode

What do you have to start with and how big is your list of genes?

ADD REPLY
0
Entering edit mode

You need to add little more information. Normally if you are working with a well-studies species like human and mouse, then you can download their gff file from Ensembl or UCSC. You can either use the full gff or subset of the gff file. If your organism doesn't have an annotated reference genome, then you can use Tuxedo suit tools for your RNA-seq analysis.

ADD REPLY
0
Entering edit mode

BTW, if you're just starting out and not doing something extremely simple, then you might be best off finding a local collaborator.

ADD REPLY
1
Entering edit mode
10.3 years ago
Dejian ★ 1.3k

If you are studying a well-annotated species, you can download a GTF or GFF file from Ensembl, NCBI, or UCSC. Then, you just filter the GTF/GFF file and get the lines related to your genes. That's done. You can also check the tophat website to see whether your species in on their list. If yes, you can choose one of the three sources of annotation. They provide a full set of information.

However, if you are studying a newly sequenced species, probably your should generate the annotation for those genes by yourself. You must conform to the GTF specifications. Currently, GTF2 and GFF3 are both popular.

ADD COMMENT
0
Entering edit mode

Could you please define "just filter the GTF/GFF file and get the lines related to your genes"?

I am studying an almost-not-annotated species and therefore trying to make my own annotation...to date with no success :(

Thank you in advance!

ADD REPLY
0
Entering edit mode

You mentioned "a list of genes" you are interested in analyzing. The filtering process is to single out the genes you are interested in. Unfortunately, you are studying a non-model species. If the genome is sequenced, you can see if the genome annotation is provided. If the genome is not sequenced, you need to first assemble transcripts using Trinity etc. After obtaining the transcripts, you can annotate them by BLAST-ing those scripts against protein and RNA sequences from closely related species.

ADD REPLY

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6