Locally annotating SNP IDs and Gene names of called variants
1
0
Entering edit mode
12 months ago
entropy ▴ 50

I have GWAS results after variant calling. The VCF file only had CHR (1:22) and POS (12345678 etc) information but the ID column has all ".", namely no rsIDs in it. After GWAS analysis I have a list of variants with CHR and POS information along with their statistical results from Plink2 as a tab delimited text file.

I want to annotate these variants in command line using a tool like VEP if possible. I want to get rsIDs and corresponding gene symbols (GRCh38) and other important annotations. VEP documentation fro command line is a bit not clear to me for my multiple annotation tasks.

Could you please provide me an example command along with example input file names for running VEP for this goal? If VEP is not sufficient for the task alone, other tools like snpEff is also ok.

My GWAS result text file has a header and data like below:

CHR    POS       ID     P         BETA 
4      123456     .     0.005     23.4
... and so on
snpEff VEP SNP GWAS variant • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi Emily I see you suggest VEP for similar questions. It would be great if you could also give an example commadn for my query? Thanks.

ADD REPLY
1
Entering edit mode
12 months ago
barslmn ★ 2.3k

VEP has a great web interface You can try using that.

ADD COMMENT
0
Entering edit mode

Yes, but I want to make VEP as part of my pipeline and thus would like to use the command line version to automate the process. The documentation was a bit confusing for me and could not see an example command that shows how make annotations for both rsID and Gene symbols at he same time given that there is only chr id and coordinate information.

ADD REPLY
0
Entering edit mode

Have you installed the VEP? What was the command you tried?

ADD REPLY
0
Entering edit mode

Yes I downloaded the VEP cache and you can see the details of it below.

I have also craffetd a command that mostly working but it does not return rsIDs. Instead it returned COSMIC IDs. I am currently looking for a way to get rsIDs instead of COSMIC. Below is the command that mostly worked and one example output of VEP:

vep -i vep_ex_input.txt --cache --dir_cache /home/.vep --offline --fork 5 --sift b --canonical  --symbol  --check_existing --assembly GRCh38 --tab --polyphen b --fields "#Uploaded_variation,SYMBOL,Existing_variation,Consequence,IMPACT,SIFT,PolyPhen,CANONICAL,CLIN_SIG,PHENO,Location,Allele,Gene,Feature,Feature_type,GENE_PHENO,BIOTYPE,VARIANT_CLASS,ENSP,NEAREST,STRAND,SYMBOL_SOURCE,UPLOADED_ALLELE,REF_ALLELE,USED_REF,GIVEN_REF,SOMATIC,MAX_AF,AF,EUR_AF,AMR_AF,PUBMED,CELL_TYPE" -o outputs/annotated_output.txt

An an example output of VEP:

-   ZYZ2        COSV10011132    3_prime_UTR_variant,NMD_transcript_variant  MODIFIER    -   -   -   -   1   8:123456789     A   ENSG00000001233 ENST00000123457 Transcript  -   -   -   -   -   -1  HGNC    -   -   -   -   1   -   -   -   -   -   -

And here is the VEP information:

$ vep --show_cache_info --dir_cache /home/.vep
Smartmatch is experimental at /app/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm line 472.
gnomADg v3.1.2
gnomADe r2.1.1
ClinVar 202301
polyphen        2.2.3
sift    6.2.1
1000genomes     phase3
assembly        GRCh38.p14
COSMIC  97
regbuild        1.0
genebuild       2014-07
dbSNP   154
HGMD-PUBLIC     20204
gencode GENCODE 44
ADD REPLY
0
Entering edit mode

8:123456789

In this case there is no rsid for that variant. https://gnomad.broadinstitute.org/region/8-123456769-123456809?dataset=gnomad_r4

Could you try a variant we know to have an rsid like this: 17 43044391 G A

https://www.ncbi.nlm.nih.gov/snp/rs12516

ADD REPLY
0
Entering edit mode

I have run for all the genes, which are 15660 genes in my data and I could not see a single rsID. The "Existing_variation" columns has all the variant IDs. It has either "-" or "COSV12345" like content.

ADD REPLY
0
Entering edit mode

Only other thing comes to my mind is to the with input format.

Could you format your input file into one of these formats: https://www.ensembl.org/info/docs/tools/vep/vep_formats.html

ADD REPLY
0
Entering edit mode

I think I already did so. Here is the head example of my input data format:

1 12345678  12345678    C/T      +
4  32345678 32345678    G/A      +
2 52345678  52345678    T/C      +
ADD REPLY
0
Entering edit mode

Could you try a variant we know to have an rsid like this: 17 43044391 G A

https://www.ncbi.nlm.nih.gov/snp/rs12516

Could you still try running this to see if rsid is returned?

It would be like this in VEP format:

17 43044391 43044391 G/A +

ADD REPLY

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6