Hi all,
Any idea why the gene_ids of NCBI's gtf file of T2T human genome assembly have "_1" in the end?
NC_060925.1 BestRefSeq gene 52979 54612 . - . gene_id "LOC101928626_1"; transcript_id ""; db_xref "GeneID:101928626"; description "uncharacterized LOC101928626"; gbkey "Gene"; gene "LOC101928626"; gene_biotype "lncRNA";
NC_060925.1 BestRefSeq transcript 52979 54612 . - . gene_id "LOC101928626_1"; transcript_id "NR_125957.1"; db_xref "GeneID:101928626"; exception "annotated by transcript or proteomic data"; gbkey "ncRNA"; gene "LOC101928626"; inference "similar to RNA sequence (same species):RefSeq:NR_125957.1"; note "The RefSeq transcript has 2 substitutions, 1 non-frameshifting indel compared to this genomic sequence"; product "uncharacterized LOC101928626"; transcript_biotype "lnc_RNA";
NC_060925.1 BestRefSeq exon 54522 54612 . - . gene_id "LOC101928626_1"; transcript_id "NR_125957.1"; db_xref "GeneID:101928626"; exception "annotated by transcript or proteomic data"; gene "LOC101928626"; inference "similar to RNA sequence (same species):RefSeq:NR_125957.1"; note "The RefSeq transcript has 2 substitutions, 1 non-frameshifting indel compared to this genomic sequence"; product "uncharacterized LOC101928626"; transcript_biotype "lnc_RNA"; exon_number "1";
NC_060925.1 BestRefSeq gene 111940 112877 . - . gene_id "OR4F29_1"; transcript_id ""; db_xref "GeneID:729759"; db_xref "HGNC:HGNC:31275"; description "olfactory receptor family 4 subfamily F member 29"; gbkey "Gene"; gene "OR4F29"; gene_biotype "protein_coding"; gene_synonym "OR7-21";
NC_060925.1 BestRefSeq transcript 111940 112877 . - . gene_id "OR4F29_1"; transcript_id "NM_001005221.2"; db_xref "GeneID:729759"; exception "annotated by transcript or proteomic data"; gbkey "mRNA"; gene "OR4F29"; inference "similar to RNA sequence, mRNA (same species):RefSeq:NM_001005221.2"; note "The RefSeq transcript has 9 substitutions, 1 frameshift compared to this genomic sequence"; product "olfactory receptor family 4 subfamily F member 29"; tag "RefSeq Select"; transcript_biotype "mRNA";
It breaks some analyses for GO enrichment/GSEA. Is it safe just to remove these underscores?
cheers
awesome, thanks!