Entering edit mode
2.6 years ago
otieno43
▴
30
Hello,
I am trying to extract fasta sequences of transcripts in genome fasta file with gtf annotation file using gffread that used the command: gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf
.
However, when I execute this, I get 0k of trranscript.fasta (empty). I am unable to figure out what could be the problem. Anyone who can help please.
Here is the command I used and the results:
./gffread/gffread -w transcripts.fasta -g /Users/ee/Desktop/genome/VectorBase-56_GfuscipesIAEA2018_Genome.fasta /Users/ee/Desktop/genome/Gfus_x10_vNew.gtf
Results
290K VectorBase-56_GfuscipesIAEA2018_Genome.fasta
0B transcripts.fa
2.1K gclib
Thanks Erick
Are the fasta and gtf obtained from the same source? You may want to include a link to them. It's important that the gtf corresponds to the correct assembly, and that formatting such as chromosome names are the same.
On top of the important checks mentioned by rpolicastro you can check what type of feature is present in the gff file ( You need transcript feature for that command to work). You may use
agat_sp_extract_sequences.pl
from AGAT that will list at the beginning of the parsing log, the feature types found in your file.The gtf file has transcript feature in it.
Yes the fasta and gtf files are from the same source. The gtf features correspond correctly with the genome assembly. Everything seems correct, I can't just figure out what is the issue. I can share with you the gtf and genome fasta file.
Yea, please share the files and I can check them.