I would like to use HTSeq (htseq-count) and edgeR to analysis our alligator RNA-Seq. The alligator gff3 file I download from GIGADB was not accepted by htseq-count as the below. What I need is that there is a gene symbol in the exon type row, e.g.
scaffold-729 AUGUSTUS exon 101305 101913 . - . ID=exon67799;Parent=rna5642;Name=WNT3A
However, there is no gene symbol in the exon type row. The gene symbol I need only appears in the gene type row. Could you teach me how to modify the gff3 file that htseq-count can accept? Many thanks.
Gary
scaffold-729 AUGUSTUS gene 101305 186845 1 - . ID=gene3770;Name=WNT3A;gene=WNT3A;Dbxref=CrocBase:AMISG003770,GeneID:395396,PhylomeDB:Phy004KWLF_ALLMI;Note=WNT3A inferred by phylogenetic tree homology from Gallus gallus EntrezGene:395396 PhylomeDB:Phy004KWLF_ALLMI
scaffold-729 AUGUSTUS mRNA 101305 186845 . - . ID=rna5642;Name=AMIST005642;transcript_id=AMIST005642;gene=WNT3A;Dbxref=CrocBase:AMIST005642,GeneID:395396,PhylomeDB:Phy004KWLF_ALLMI;Parent=gene3770;Note=WNT3A inferred by phylogenetic tree homology from Gallus gallus EntrezGene:395396 PhylomeDB:Phy004KWLF_ALLMI
scaffold-729 AUGUSTUS CDS 101434 101913 . - 0 ID=cd59543;Parent=rna5642
scaffold-729 AUGUSTUS CDS 106298 106563 . - 2 ID=cd59544;Parent=rna5642
scaffold-729 AUGUSTUS CDS 141700 141941 . - 1 ID=cd59545;Parent=rna5642
scaffold-729 AUGUSTUS CDS 186490 186560 . - 0 ID=cd59546;Parent=rna5642
scaffold-729 AUGUSTUS exon 101305 101913 . - . ID=exon67799;Parent=rna5642
scaffold-729 AUGUSTUS exon 106298 106563 . - . ID=exon67800;Parent=rna5642
scaffold-729 AUGUSTUS exon 141700 141941 . - . ID=exon67801;Parent=rna5642
scaffold-729 AUGUSTUS exon 186490 186845 . - . ID=exon67802;Parent=rna5642
scaffold-729 AUGUSTUS intron 101914 106297 . - . ID=intron53902;Parent=rna5642
scaffold-729 AUGUSTUS intron 106564 141699 . - . ID=intron53903;Parent=rna5642
scaffold-729 AUGUSTUS intron 141942 186489 . - . ID=intron53904;Parent=rna5642
May be you can try
-i="Name"
. See the doc.Many thanks. However, after trying
-i=Name
or-i='Name'
, the htseq-count show an error:I guess that htseq-count only can identify the Name attribute if the Name attribute and the exon type in the same row.
Gary
As GouthamAtla implied, the defaults are appropriate for GTF files from Ensembl. They aren't always applicable to any random GFF file (that's part of the problem with GFF as a format). When something doesn't work, reading the documentation should be your first step.
Thanks. You are right. By default, htseq-count expects a GTF file. I can run htseq-count well with mouse and chicken RNA-Seq, using RefSeq or Ensembl annotation files downloaded from the iGenome. I think my problem is that I don't know how to modify an alligator GFF file to match the format htseq-count need shown in its document.
Gary