How To Convert Fasta To Gff?
2
0
Entering edit mode
13.1 years ago
Melissa • 0

Hi, I'm trying to do some annotations on a novel genome, how do I convert a fasta file to gff3 file? or must I go fasta->BED->gff? (in that case how do I convert fasta to BED?)

thanks

fasta gff • 26k views
ADD COMMENT
3
Entering edit mode

I think the short answer is: you can't, for the reason Steve describes, unless the FASTA header description contains the required information (coordinates and strand).

ADD REPLY
2
Entering edit mode

I'm confused: a FASTA file has sequence information, and a bed or gff file has coordinate info (with annotation stuff optionally added). Can you give an example of say, 1 or 2 records from a FASTA file and show us how the corresponding GFF file you want would look like?

ADD REPLY
1
Entering edit mode

I'm guessing that the FASTA file contains the sequence of a feature (say, mRNA) and that she wants to annotate that feature on the genome using a GFF file. So she'll have to align the FASTA file first, then convert the alignment file to BED in some manner.

ADD REPLY
0
Entering edit mode

I get the question,too. When I want to use MCscanX, I need a gff file, while I do not know how to get it. Since I'm a Chinese with poor English, I even donot know how to ask a question herewww.biostars.org). Hope someone can help me, thank you!

ADD REPLY
2
Entering edit mode
13.1 years ago
Scott Cain ▴ 770

So the question is, what do you want to do with this data? I can think of one reason why you might want to create a GFF out of a fasta file: loading EST or cDNA sequences into a Chado database, for example, where you have to specify type and other attribute information during the load, and in fact there is a tool called gmod_fasta2gff3.pl that comes with Chado. Outside of that use case, I can't think of another reason to do this. If you describe what you want to do in more detail, you might get a better answer.

ADD COMMENT
1
Entering edit mode
13.0 years ago
Lee Katz ★ 3.2k

You can't strictly convert fasta to gff because fasta contains sequence information and gff contains location information. However, you can try to find the location information from the defline. If you fasta looks like this,

>CDS_0001 start=start stop=stop contig=contig strand=+ ...
ATGATGATG

Then you can try to make a GFF file by parsing the defline

print join("\t",$contig,"FASTAparser","CDS",$start,$stop,'.','+','.',$attributes)."\n";

At the very least, you will need contig/chromosome, start, and stop information.

Details on how to properly format GFF can be found on this page: http://gmod.org/wiki/GFF (coincidentally a GMOD webpage, which is what Scott Cain works on!)

ADD COMMENT

Login before adding your answer.

Traffic: 1725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6