I am new for this field. Recently I did differential analysis experiment. I used DEseq and at the end I got gene names which are differentially expressed in different tissues. because the list is big with number of transcripts, I would like to extract all the transcript sequences in fasta format using gft (or gff3) file and genome scaffold file. This is not a model organism. and I have made this gtf file using scipio program.
The gffread utility in the Cufflinks package will extract transcript fasta given a gtf/gff and reference (genome) fasta file. For all the options:
gffread -h
To get only the DE transcripts, either subset the gff/gtf or, perhaps more straightforward, subset the fasta file (see here for multiple ways of doing that)
Thank you so much for quick reply. I tried that but could not get anything. I don't know if I am doing anything wrong. Can you please provide me actual command?
Thank you
ADD REPLY
• link
updated 2.7 years ago by
Ram
44k
•
written 10.0 years ago by
amoltej
▴
100
ADD REPLY
• link
updated 2.7 years ago by
Ram
44k
•
written 10.0 years ago by
amoltej
▴
100
0
Entering edit mode
That's odd. Assuming that the chromosome names were correct, then the only reason I could think of would be a gff format that gffread does not understand..
Either try to validate your gff, or try a different tool. Perhaps bedtools will be more forgiving
Thank you so much for quick reply. I tried that but could not get anything. I don't know if I am doing anything wrong. Can you please provide me actual command?
Thank you
Make sure that the chromosome/scaffold ids are the same in gff and genomic reference (capitals, underscores etc).
I was doing same... but it doesn't work!
That's odd. Assuming that the chromosome names were correct, then the only reason I could think of would be a gff format that gffread does not understand..
Either try to validate your gff, or try a different tool. Perhaps bedtools will be more forgiving
http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html