gft to fasta conversion
1
3
Entering edit mode
10.0 years ago
amoltej ▴ 100

Hello there,

I am new for this field. Recently I did differential analysis experiment. I used DEseq and at the end I got gene names which are differentially expressed in different tissues. because the list is big with number of transcripts, I would like to extract all the transcript sequences in fasta format using gft (or gff3) file and genome scaffold file. This is not a model organism. and I have made this gtf file using scipio program.

Can somebody please help me

Thank you in advance

Amol

gene RNA-Seq sequence • 27k views
ADD COMMENT
9
Entering edit mode
10.0 years ago
David Fredman ★ 1.1k

The gffread utility in the Cufflinks package will extract transcript fasta given a gtf/gff and reference (genome) fasta file. For all the options:

gffread -h

To get only the DE transcripts, either subset the gff/gtf or, perhaps more straightforward, subset the fasta file (see here for multiple ways of doing that)

ADD COMMENT
0
Entering edit mode

Thank you so much for quick reply. I tried that but could not get anything. I don't know if I am doing anything wrong. Can you please provide me actual command?

Thank you

ADD REPLY
2
Entering edit mode
gffread your_transcripts.gff -g genomic_reference.fasta -w your_transcripts.fasta​

Make sure that the chromosome/scaffold ids are the same in gff and genomic reference (capitals, underscores etc).

ADD REPLY
0
Entering edit mode

I was doing same... but it doesn't work!

ADD REPLY
0
Entering edit mode

That's odd. Assuming that the chromosome names were correct, then the only reason I could think of would be a gff format that gffread does not understand..

Either try to validate your gff, or try a different tool. Perhaps bedtools will be more forgiving

http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6