Hi all,
I am interested in pulling out "CDS" fasta sequences from a GFF3 file. My GFF3 file lists one or more fasta sequences at the end.
I looked into using the gffutils
python package but according to this documentation, it does not deal with fasta sequences listed at the end of the GFF3 file.
Am I stuck writing my own parser, or is there a better solution?
Thanks for any suggestions!
EDITED to add a snippet of my GFF file:
##gff-version 3
##sequence-region Consensus_10_consensus_sequence_1 1 645439
Consensus_10_consensus_sequence_1 feature gene 551 2104 . + . name=gene
Consensus_10_consensus_sequence_1 feature CDS 551 2104 . + 0 name=YALI0C06512p CDS
...
Consensus_9_consensus_sequence_4 feature mRNA 9625 9891 . + . name=mRNA
Consensus_9_consensus_sequence_4 feature CDS 9625 9891 . + 0 name=YALI0A21351p CDS
##FASTA
>Consensus_10_consensus_sequence_1 <unknown description>
TGCCACCTCCAAATTAACTCTCGCTTATTTCTTGTACCTGTCATATCACGTGATGTAGCT
TCCCAATCAAGAGCGGATCCTGCCTGTTTGGCTGCGTGGGTTTGCGTCTTCTTTCCGTTT
GAAGCAGTGGTATTATTCCCCCATTGTGCCAAAAGTAATGCTGAAAAGATGCCCACGAAT
>Consensus_10_consensus_sequence_2 <unknown description>
CCGAAACCACAGCCATGATCAGAGTCACTCCTATTCCAACCCCGCCAACTGCCGTGGGGT
ACACGTTATATTGGGTAGGTGTGTATCCTTGAGACTTGAGCCAGGAGATCATGGAAGGTT
GGGTGCTGGCAATACAGGTGTTATTGTAGCAAAGAAAGATGAGAGAGAAGAAATAAATGT
GCCAGGTTTTCAAAGAGCGTTTCAAAACTTGCAAGAACGGCTCTTCTTTGTTACCGGTAT
CGCCGATTTGTGCTCTTCTCTCCTTGGCTATAGCAATGTCTTCCTTGGTGAAGTACCAGC
TGGTAGTTGTCTCCGGTGTGTTTGGGTTGACGAACATGGTGTAAAGTGCCACTGGAAATG
>Consensus_10_consensus_sequence_2 <unknown description>
CACTCTCAAGCATTTAGGAACTTGTCAAGAGGTTCAAAGGTTGGAACTTGCAACTGAACT
GATCGCAACATAATCACCGCTATTAAACCCTGATTACAAGTGCTTCTGATTGCATCCACA
GTTCATTTCCATGGGCTAGGCTATACGAAAATACAAGGATTAGAAACTATATACAATTGA
CTCTGCAATCTTTCCCGCTAAACGGTGGTGTGGTTATGACCTGGCTCGTGTTCATGGCCG
...
Please post a snippet of your GFF3 file.