Entering edit mode
9.5 years ago
biolab
★
1.4k
Hi everyone,
I have a draft genome fasta file and a GFF annotation file. The GFF file is like below.
9311_chr12 GLEAN mRNA 17901210 17902763 0.90124 + . ID=9311_GLEAN_10008559;
9311_chr12 GLEAN CDS 17901210 17901318 . + 0 Parent=9311_GLEAN_10008559;
9311_chr12 GLEAN CDS 17901418 17901486 . + 2 Parent=9311_GLEAN_10008559;
9311_chr12 GLEAN CDS 17901566 17901672 . + 2 Parent=9311_GLEAN_10008559;
9311_chr12 GLEAN CDS 17901722 17901755 . + 0 Parent=9311_GLEAN_10008559;
9311_chr12 GLEAN CDS 17902585 17902763 . + 2 Parent=9311_GLEAN_10008559;
9311_chr04 GLEAN mRNA 22207209 22208012 0.999282 - . ID=9311_GLEAN_10029041;
9311_chr04 GLEAN CDS 22207209 22208012 . - 0 Parent=9311_GLEAN_10029041;
My purpose is to get the gene coding sequences (without UTRs). I can filter the GFF file to include the CDS tracks only, but how to achieve the next step, that is to get the CDS sequences? Thank you very much!
Sorry for re-posting. I have found the solution on Biostars Extract Cds Fastas From A Gff Annotation + Reference Sequence
Thanks for your attention.