Hi,
I am trying to annotate my (MACS2) ChIP seq Peak file with Homer annotatePeaks.pl with a custom GFF3 file. The GFF3 is refseq GRCh38. When I use this GFF3 file, Homer annotatePeaks.pl uses the parent ID (ex. rna7) rather than the transcript ID (ex. NR_026818.1)
Is there a way I can convert my GFF3 file to a GTF file that uses the transcript ID rather than the parent ID?
I noticed this problem when I used STAR/RSEM with a GFF3 file. Once STAR aligns the transcripts, they have parent ID names. I have to use --amend-names in RSEM and I end up with annotated transcripts with the name RNA##_transcriptID. All I want is the transcript ID.
What is the point of the parent ID? It doesn't help to have a bunch of annotated genes with the name gene1, gene2, gene3.
Thanks for the help.
Simon
gff3 sample
NC_000001.11 RefSeq region 1 248956422 . + . ID=id0;Dbxref=taxon:9606;Name=1;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
NC_000001.11 BestRefSeq gene 11874 14409 . + . ID=gene0;Dbxref=GeneID:100287102,HGNC:HGNC:37102;Name=DDX11L1;description=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;gene_biotype=misc_RNA;pseudo=true
NC_000001.11 BestRefSeq transcript 11874 14409 . + . ID=rna0;Parent=gene0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;Name=NR_046018.2;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1;transcript_id=NR_046018.2
NC_000001.11 BestRefSeq exon 11874 12227 . + . ID=id1;Parent=rna0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1;transcript_id=NR_046018.2