Annotate genes from gtf file to vcf file
1
0
Entering edit mode
8 months ago
QX ▴ 60

Hi all,

Is there any tool that can help to annotate the genes from GTF file to a (columns) in the vcf file?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1
1       51479   .       T       A       570.64  PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=-1.121e+00;DP=25;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=22.83;ReadPosRankSum=0.953;SOR=0.760    GT:AD:DP:GQ:PL  0/1:5,20:25:99:578,0,99
1       51803   .       T       C       627.06  PASS    AC=2;AF=1.00;AN=2;DP=19;ExcessHet=0.0000;FS=0.000;MLEAC=2;
gtf genomics vcf • 639 views
ADD COMMENT
0
Entering edit mode

Don't forget to follow up on your threads, that is bad etiquette. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY
2
Entering edit mode
8 months ago

not tested:

gunzip -c  in.gtf.gz |\
awk -F '\t' '($3=="gene") {G="."; N=split($9,a,/[; "]*/); for(i=1;i+1<=N;i++){if(a[i]=="gene_name") {G=a[i+1];break;}} printf("%s\t%d\t%d\t%s\n",$1,int($4)-1,$5,G);}'  |\
sort -t $'\t' -k1,1 -k2,2n |\
bgzip > genes.bed.gz 

tabix -p bed  genes.bed.gz 


echo '##INFO=<ID=GENE,Number=.,Type=String,Description="genes">' > genes.header


bcftools annotate -a "genes.bed.gz " -h  genes.header -c "CHROM,FROM,TO,GENE" --merge-logic 'GENE:unique' in.vcf

or build a SNPEFF database with your gtf and annotate with snpeff https://pcingola.github.io/SnpEff/snpeff/build_db_gff_gtf/

ADD COMMENT
0
Entering edit mode

Hi Pierre Lindenbaum,

It works, many thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6