Entering edit mode
6.1 years ago
from the mountains
▴
250
I'm trying to convert a gff3 to gtf with gffread and i'm seeing unexpected filtering of most records in the gff3. it is for a viral genome. Only 10 out of 180 are going through. curiously, the first of these two records are converting, but the second is not:
EF999921.1 Genbank transcript 60089 60314 . + 1 ID=cds-ABV71552.1;Dbxref=NCBI_GP:ABV71552.1;Name=ABV71552.1;gbkey=transcript;product=UL22A;protein_id=ABV71552.1
EF999921.1 Genbank transcript 83771 83813 . - 1 ID=cds-ABV71567.1;Dbxref=NCBI_GP:ABV71567.1;Name=ABV71567.1;gbkey=transcript;product=UL37;protein_id=ABV71567.1
They seem to have the same types of information. The difference in these two examples is + and - but the full list that gets filtered in is a mix of + and -. codon start is also mixed.
Does the gff need to be sorted or prepped in any way? this file is ascii text with unix line terminators. very confused here.
my command is
path/to/cufflinks-2.2.1/gffread input.gff3 -T -o output.gtf
When we convert with gffread, the gene,transcript etc entries will not be retained as such. But the required transcript and gene information will be the there in the attribute(9th)column of the exon entries present.This gtf should suffice for most of the applications.
However if you want to have the gtf with gene, transcript entries etc itself , please give a try with other tools which does the conversion.Please see this post.
i'm not sure if this answers my question. The input examples i show are both transcript features and their gbkey is transcript as well. How come one is retained in output.gtf but the other isn't?