I'm trying to annotate genes onto a VCF file with bcftools.
My annotation file is a .bed file that originally was a hg38 UCSC knownGene gtf file, converted by BEDOPS: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/
Original GTF file:
chr1 11868 12227 . . + knownGene exon . gene_id "ENST00000456328.2"; transcript_id "ENST00000456328.2"; exon_number "1"; exon_id "ENST00000456328.2.1";
chr1 11868 14409 . . + knownGene transcript . gene_id "ENST00000456328.2"; transcript_id "ENST00000456328.2";
chr1 12009 12057 . . + knownGene exon . gene_id "ENST00000450305.2"; transcript_id "ENST00000450305.2"; exon_number "1"; exon_id "ENST00000450305.2.1";
chr1 12009 13670 . . + knownGene transcript . gene_id "ENST00000450305.2"; transcript_id "ENST00000450305.2";
I've been able to successfully annotate 2 columns (feature and frame) onto my VCF file, but bcftools is unable to add the attributes section (gene_id, transcript_id, etc.) of the .bed file.
I run the following command to annotate my VCF file:
bcftools annotate -a hg38.knownGene.bed.gz
-c CHROM,-,INFO/FEATURE,FROM,TO,-,-,INFO/FRAME,INFO/ATTRIBUTES
-h hdr.txt
input_SNPs.vcf
hdr.txt
is as follows:
##INFO=<ID=ATTRIBUTES,Number=1,Type=String,Description="Attributes, including gene_id, transcipt_id, exon_number, and exon_id">
##INFO=<ID=FEATURE,Number=1,Type=String,Description="feature type name">
##INFO=<ID=FRAME,Number=1,Type=String,Description="frame number">
hdr.txt (END)
Feature and frame are added in the INFO section of the VCF, but not attributes like gene_id and transcipt_id.
VCF file results:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GW1910172553rd
chr1 14464 . A T . PASS DP=6;ECNT=1;POP_AF=0.001;P_GERMLINE=-0.0002169;TLOD=20.55;FEATURE=29570;FRAME=transcript GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB
0/1:1,5:0.833:1,5:0,0:41:166416,357:60:11:0,0.828,0.833:0.03,0.025,0.945
chr1 14653 . C T . PASS DP=102;ECNT=2;POP_AF=0.001;P_GERMLINE=-0.0002169;TLOD=266.34;FEATURE=29570;FRAME=transcript GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB
0/1:27,68:0.699:27,68:0,0:41:270,225:60:23:0.697,0.707,0.716:0.019,0.031,0.951
I'm not sure why bcftools won't add the attributes column. It might be because the structure of attributes messes with bcftools in regards to its spacing and semicolons?
If anyone could help with adding the attributes column to my VCF file, I would appreciate it.
Do you think there's any way I can add the entire attributes column (gene_id, transcipt_id, exon_id, etc.) as a singular, combined column? I don't really need separate gene_id, transcipt_id, etc. columns, but I can separate if there's no other way to annotate the VCF file with the attributes.