How to add attributes (i.e. gene_id, transcipt_id, exon_id, etc.) annotation from .bed file onto VCF?
1
0
Entering edit mode
19 months ago
yuuniper • 0

I'm trying to annotate genes onto a VCF file with bcftools.

My annotation file is a .bed file that originally was a hg38 UCSC knownGene gtf file, converted by BEDOPS: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/

Original GTF file:

chr1    11868   12227   .       .       +       knownGene       exon    .       gene_id "ENST00000456328.2"; transcript_id "ENST00000456328.2"; exon_number "1"; exon_id "ENST00000456328.2.1";
chr1    11868   14409   .       .       +       knownGene       transcript      .       gene_id "ENST00000456328.2"; transcript_id "ENST00000456328.2"; 
chr1    12009   12057   .       .       +       knownGene       exon    .       gene_id "ENST00000450305.2"; transcript_id "ENST00000450305.2"; exon_number "1"; exon_id "ENST00000450305.2.1";
chr1    12009   13670   .       .       +       knownGene       transcript      .       gene_id "ENST00000450305.2"; transcript_id "ENST00000450305.2"; 

I've been able to successfully annotate 2 columns (feature and frame) onto my VCF file, but bcftools is unable to add the attributes section (gene_id, transcript_id, etc.) of the .bed file.

I run the following command to annotate my VCF file:

bcftools annotate -a hg38.knownGene.bed.gz 
-c CHROM,-,INFO/FEATURE,FROM,TO,-,-,INFO/FRAME,INFO/ATTRIBUTES
-h hdr.txt 
input_SNPs.vcf

hdr.txt is as follows:

##INFO=<ID=ATTRIBUTES,Number=1,Type=String,Description="Attributes, including gene_id, transcipt_id, exon_number, and exon_id">
##INFO=<ID=FEATURE,Number=1,Type=String,Description="feature type name">
##INFO=<ID=FRAME,Number=1,Type=String,Description="frame number">
hdr.txt (END)

Feature and frame are added in the INFO section of the VCF, but not attributes like gene_id and transcipt_id.

VCF file results:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  GW1910172553rd
chr1    14464   .       A       T       .       PASS    DP=6;ECNT=1;POP_AF=0.001;P_GERMLINE=-0.0002169;TLOD=20.55;FEATURE=29570;FRAME=transcript        GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     
0/1:1,5:0.833:1,5:0,0:41:166416,357:60:11:0,0.828,0.833:0.03,0.025,0.945
chr1    14653   .       C       T       .       PASS    DP=102;ECNT=2;POP_AF=0.001;P_GERMLINE=-0.0002169;TLOD=266.34;FEATURE=29570;FRAME=transcript     GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     
0/1:27,68:0.699:27,68:0,0:41:270,225:60:23:0.697,0.707,0.716:0.019,0.031,0.951

I'm not sure why bcftools won't add the attributes column. It might be because the structure of attributes messes with bcftools in regards to its spacing and semicolons?

If anyone could help with adding the attributes column to my VCF file, I would appreciate it.

vcf gtf bcftools • 786 views
ADD COMMENT
0
Entering edit mode
19 months ago

Annotations work on tab-delimited files.

The gene_id and transcript_id are not separate columns but part of a single column.

If you create a new custom TAB delimited file where the gene_id is an actual standalone column, then I think the process ought to work.

ADD COMMENT
0
Entering edit mode

Do you think there's any way I can add the entire attributes column (gene_id, transcipt_id, exon_id, etc.) as a singular, combined column? I don't really need separate gene_id, transcipt_id, etc. columns, but I can separate if there's no other way to annotate the VCF file with the attributes.

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6