Hi Everyone,
I looked through and couldn't find a good answer so hoping someone can help.
I have a gff in the format:
1 gramene gene 199345 205715 . - . ID=gene:Zm00001d027240;biotype=protein_coding;gene_id=Zm00001d027240;logic_name=maker_gene
1 gramene mRNA 199345 205715 . - . ID=transcript:Zm00001d027240_T001;Parent=gene:Zm00001d027240;biotype=protein_coding;transcript_id=Zm00001d027240_T001
1 gramene three_prime_UTR 199345 199763 . - . Parent=transcript:Zm00001d027240_T001
1 gramene exon 199345 199771 . - . Parent=transcript:Zm00001d027240_T001;Name=Zm00001d027240_T002.exon8;constitutive=0;ensembl_end_phase=-1;ensembl_phase=1;exon_id=Zm00001d027
240_T002.exon8;rank=9
If I'm running htseq-count with the intention of collecting gene counts for downstream differential gene expression analysis and the exon annotations do not directly refer to the gene Id, how do I collect counts for gene IDs by counting exons, which is the recommended annotation type to count? Do I need to reformat the gff file? Wondering what to set the --idattr option to in HTSeq-count as no reference to the gene ID exists in the exon line. Many thanks for your help!