Question

add gene names to 'isec' output files of bcftools'

0

Entering edit mode

3.0 years ago

minoo ▴ 10

I had two vcf files and I used isec from bcftools software to find typical and common mutations between samples. The output of isec function were four vcf.gz file showing like below:

 isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz
 isec_output/0001.vcf.gz would be variants unique to 2.vcf.gz 
isec_output/0002.vcf.gz would be variants shared by 1.vcf.gz and  2.vcf.gz as represented in 1.vcf.gz
 isec_output/0003.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 2.vcf.g

The output files look like something like below:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
chrM    5   .   A   .   .   PASS    .   GT:GQX:DP:DPF   0/0:358:120:0
chrM    7   .   A   .   .   PASS    END=9;BLOCKAVG_min30p3a GT:GQX:DP:DPF   0/0:560:187:3
chrM    10  .   T   .   .   PASS    END=13;BLOCKAVG_min30p3a    GT:GQX:DP:DPF   0/0:782:261:5
chrM    14  .   T   .   .   PASS    END=17;BLOCKAVG_min30p3a    GT:GQX:DP:DPF   0/0:1092:364:5

How can I add gene names to these files? I am new to this field and I don't know how can I identify mutations by gene names from these files. Shall I do further annotation steps?

bcftools vcftools gene-symbol • 1.6k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 3.0 years ago by minoo ▴ 10

0

Entering edit mode

You can intersect with dbSNP vcf file. dbSNP vcf has GENEINFO tag for gene symbol and ID. But that would be resource kill. Try with bed file as Pierre suggested below. Download gtf file for humans, filter for genes, convert the new gtf file to bed file.

ADD REPLY • link 3.0 years ago by cpad0112 21k

score 0 · Answer 1 · 2021-12-07

0

Entering edit mode

3.0 years ago

Pierre Lindenbaum 164k

get a BED file with the gene names, bgzip and index it with tabix and use bcftools annotate to add the gene names to the VCF.

ADD COMMENT • link 3.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

where can I get a BED file with gene names? Also should the vcf files be normalized before using isec function? My samples are from human by the way.

ADD REPLY • link 3.0 years ago by minoo ▴ 10

0

Entering edit mode

Try https://genome.ucsc.edu/cgi-bin/hgTables/

ADD REPLY • link 3.0 years ago by cpad0112 21k

0

Entering edit mode

Hi @Pierre Lindenbaum Thanks for your reply, but I am still stuck in this problem. ould you please explain your respond in details, so I can understand what should I do. I have no idea about the bed files, bgzip and tabix or even bcftools annotate

ADD REPLY • link 2.8 years ago by minoo ▴ 10