I created a GTF file for HLA alleles to be used as a resource for GATK Funcotator. Running Funcotator without indexing the GTF gives this error:
A USER ERROR has occurred: Input funcotator_dataSources.v1.7.20200521s/gencode/hla/hla.annotation.gtf must support random access to enable queries by interval. If it's a file, please index it using the bundled tool IndexFeatureFile
The first few lines of GTF file:
hla_a_01_01_01_01 IMGHLA gene 1 3503 . + . gene_id "hla_a_01_01_01_01"; gene_name "hla_a_01_01_01_01"; source "IMGHLA";
hla_a_01_01_01_01 IMGHLA transcript 1 3503 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1";
hla_a_01_01_01_01 IMGHLA exon 301 373 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1"; exon_number "1"; exon_id "hla_a_01_01_01_01_e_1";
hla_a_01_01_01_01 IMGHLA exon 504 773 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1"; exon_number "2"; exon_id "hla_a_01_01_01_01_e_2";
hla_a_01_01_01_01 IMGHLA exon 1015 1290 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1"; exon_number "3"; exon_id "hla_a_01_01_01_01_e_3";
I need to index this file before running Funcotator. I tried using the GATK IndexFeatureFile, as suggested by Funcotator but it gives this error:
A USER ERROR has occurred: Unknown file is malformed: Decoded feature is not valid: hla_a_01_01_01_01 IMGHLA gene 1 3503 . + . gene_id "hla_a_01_01_01_01"; gene_name "hla_a_01_01_01_01"; source "IMGHLA";
hla_a_01_01_01_01 IMGHLA transcript 1 3503 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1";
hla_a_01_01_01_01 IMGHLA exon 301 373 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1"; exon_number 1; exon_id "hla_a_01_01_01_01_e_1";
hla_a_01_01_01_01 IMGHLA exon 504 773 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1"; exon_number 2; exon_id "hla_a_01_01_01_01_e_2";
hla_a_01_01_01_01 IMGHLA exon 1015 1290 . + . gene_id "hla_a_01_01_01_01"; transcript_id "hla_a_01_01_01_01.1"; gene_name "hla_a_01_01_01_01"; transcript_name "hla_a_01_01_01_01.1"; exon_number 3; exon_id "hla_a_01_01_01_01_e_3";
Can someone suggest a solution for this or an alternate tool to index the GTF file?
cross posted: https://stackoverflow.com/questions/78996218/