No results from VEP. What am I doing wrong?
0
0
Entering edit mode
4.5 years ago
rsnewbie • 0

I submitted some rs nums to Variant Effect Predictor, pushed returned ... no results!

What am I doing wrong?

There were about 40 empty data fields in the reported file.
I want to find the MAFs of a list of SNPs. When I click on the preview run, the MAF is returned. Yet, when I submit a batch, none of the MAFs are reported.

This should be super simple!

SNP software error vep • 2.5k views
ADD COMMENT
0
Entering edit mode

This is the file that I retrieve from VEP.

Almost all the fields are empty.

What is especially strange is that when I perform the instant run, it tells me the MAF is 0.1208, though

this information is not given when I push the run button.

##fileformat=VCFv4.1
##VEP="v100" time="2020-07-06 18:44:22" cache="/net/isilonP/public/ro/ensweb-data/latest/tools/grch37/e100/vep/cache/homo_sapiens/100_GRCh37" db="homo_sapiens_core_100_37@hh-mysql-ens-grch37-web" 1000genomes="phase3" COSMIC="90" ClinVar="201912" ESP="20141103" HGMD-PUBLIC="20194" assembly="GRCh37.p13" dbSNP="153" gencode="GENCODE 19" genebuild="2011-04" gnomAD="r2.1" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|MANE|TSL|APPRIS|SIFT|PolyPhen|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
22  18075053    rs7414  G   A   .   .   CSQ=A|3_prime_UTR_variant|MODIFIER|ATP6V1E1|ENSG00000131100|Transcript|ENST00000253413.5|protein_coding|9/9||||1251|||||rs7414||-1||HGNC|857|||||||||||||||||||||||,A|downstream_gene_variant|MODIFIER|SLC25A18|ENSG00000182902|Transcript|ENST00000327451.6|protein_coding||||||||||rs7414|1293|1||HGNC|10988|||||||||||||||||||||||,A|3_prime_UTR_variant|MODIFIER|ATP6V1E1|ENSG00000131100|Transcript|ENST00000399796.2|protein_coding|8/8||||1090|||||rs7414||-1||HGNC|857|||||||||||||||||||||||,A|downstream_gene_variant|MODIFIER|ATP6V1E1|ENSG00000131100|Transcript|ENST00000399798.2|protein_coding||||||||||rs7414|122|-1||HGNC|857|||||||||||||||||||||||,A|downstream_gene_variant|MODIFIER|SLC25A18|ENSG00000182902|Transcript|ENST00000399813.1|protein_coding||||||||||rs7414|1406|1||HGNC|10988|||||||||||||||||||||||,A|downstream_gene_variant|MODIFIER|ATP6V1E1|ENSG00000131100|Transcript|ENST00000413576.1|protein_coding||||||||||rs7414|2252|-1|cds_end_NF|HGNC|857|||||||||||||||||||||||,A|upstream_gene_variant|MODIFIER|AC004019.13|ENSG00000236754|Transcript|ENST00000443935.1|antisense||||||||||rs7414|3095|-1||Clone_based_vega_gene||||||||||||||||||||||||,A|downstream_gene_variant|MODIFIER|ATP6V1E1|ENSG00000131100|Transcript|ENST00000473248.1|retained_intron||||||||||rs7414|382|-1||HGNC|857|||||||||||||||||||||||
ADD REPLY
0
Entering edit mode

What is especially strange is that when I perform the instant run, it tells me the MAF is 0.1208, though

Are you running online VEP against GRCh37 as well? If not, you'd see different results as co-ordinates don't match between v37 and v38.

ADD REPLY
0
Entering edit mode

Thank you RAMRS! When technology makes us frustrated, it is reassuring that others can help provide guidance.

GRCh37 does appear to give more info than GRCh38, though not the MAF.

I am unclear what I could be doing wrong. This should be a super simple error free request. Too bad the dbsnp batch request service is no longer available. It was such an easy online tool. Does no one offer such a tool now. Others comments that I have read on biostars and elsewhere have noted that rs MAF tools are not easy to find online now.

ADD REPLY
0
Entering edit mode

GRCh37 does appear to give more info than GRCh38, though not the MAF.

No, it does not. VEP supports GRCh38 more than GRCh37. Your VEP cache version is 37. MAF is an ambiguous term - where are you looking to get this information from - 1000 genomes/ExAC/gnomAD? From your value of 0.1208, you're looking for the 1000g global MAF.

Note that the position of rs7414 is 22:18075053 in GRCh37, while it is 22:17592287 in GRCh38. If your VEP cache has gnomAD files, I don't see why the annotated VCF does not have gnoomAD frequencies. Can you share the exact VEP command you used please?

ADD REPLY
0
Entering edit mode

Yes, the comment about GRCh37 having more info than GRCh38 did not make sense to me either. I probably changed the input fields.

I retried VEP and now MAF are appearing. Yeah! The output is below. I am very unclear what caused the change.

Here is the command line. I am still not receiving an MAF for gnomeAD.

./vep --af --af_1kg --af_esp --af_gnomad --appris --biotype --buffer_size 500 --check_existing --distance 5000 --mane --polyphen b --pubmed --regulatory --sift b --species homo_sapiens --symbol --transcript_version --tsl --cache --input_file [input_data] --output_file [output_file] --port 3337

Hmm, perhaps it is because I went away from the site and refreshed that I am now obtaining the MAFs.

#Uploaded_variation Location    Allele  Consequence IMPACT  SYMBOL  Gene    Feature_type    Feature BIOTYPE EXON    INTRON  HGVSc   HGVSp   cDNA_position   CDS_position    Protein_position    Amino_acids Codons  Existing_variation  DISTANCE    STRAND  FLAGS   SYMBOL_SOURCE   HGNC_ID MANE    TSL APPRIS  SIFT    PolyPhen    AF  AFR_AF  AMR_AF  EAS_AF  EUR_AF  SAS_AF  AA_AF   EA_AF   gnomAD_AF   gnomAD_AFR_AF   gnomAD_AMR_AF   gnomAD_ASJ_AF   gnomAD_EAS_AF   gnomAD_FIN_AF   gnomAD_NFE_AF   gnomAD_OTH_AF   gnomAD_SAS_AF   CLIN_SIG    SOMATIC PHENO   PUBMED  MOTIF_NAME  MOTIF_POS   HIGH_INF_POS    MOTIF_SCORE_CHANGE
rs7414  22:18075053-18075053    A   3_prime_UTR_variant MODIFIER    ATP6V1E1    ENSG00000131100 Transcript  ENST00000253413.5   protein_coding  9/9 -   -   -   1251    -   -   -   -   rs7414  -   -1  -   HGNC    857 -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   downstream_gene_variant MODIFIER    SLC25A18    ENSG00000182902 Transcript  ENST00000327451.6   protein_coding  -   -   -   -   -   -   -   -   -   rs7414  1293    1   -   HGNC    10988   -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   3_prime_UTR_variant MODIFIER    ATP6V1E1    ENSG00000131100 Transcript  ENST00000399796.2   protein_coding  8/8 -   -   -   1090    -   -   -   -   rs7414  -   -1  -   HGNC    857 -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   downstream_gene_variant MODIFIER    ATP6V1E1    ENSG00000131100 Transcript  ENST00000399798.2   protein_coding  -   -   -   -   -   -   -   -   -   rs7414  122 -1  -   HGNC    857 -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   downstream_gene_variant MODIFIER    SLC25A18    ENSG00000182902 Transcript  ENST00000399813.1   protein_coding  -   -   -   -   -   -   -   -   -   rs7414  1406    1   -   HGNC    10988   -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   downstream_gene_variant MODIFIER    ATP6V1E1    ENSG00000131100 Transcript  ENST00000413576.1   protein_coding  -   -   -   -   -   -   -   -   -   rs7414  2252    -1  cds_end_NF  HGNC    857 -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   upstream_gene_variant   MODIFIER    AC004019.13 ENSG00000236754 Transcript  ENST00000443935.1   antisense   -   -   -   -   -   -   -   -   -   rs7414  3095    -1  -   Clone_based_vega_gene   -   -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
rs7414  22:18075053-18075053    A   downstream_gene_variant MODIFIER    ATP6V1E1    ENSG00000131100 Transcript  ENST00000473248.1   retained_intron -   -   -   -   -   -   -   -   -   rs7414  382 -1  -   HGNC    857 -   -   -   -   -   0.1208  0.2103  0.0893  0.1399  0.0477  0.0777  -   -   -   -   -   -   -   -   -   -   -   -   -   -   27779372,27259692,27777343,27117804 -   -   -   -
ADD REPLY
0
Entering edit mode

Great!

I was having trouble with the file formats available with VEP. The pipes were not forming proper columns.

When I simply copy and paste my list of rsnums, formatting is perfect.

ADD REPLY
0
Entering edit mode

I have found that the easiest input format for VEP is tabular pseudo-VCF:

You can separate each column with a space or a tab, I've formatted the example below so it looks good:

22   18075053 22_18075053_G_A G   A
(chr pos      pseudo-id       ref alt)
ADD REPLY
0
Entering edit mode

There is a variety of file formats you can use with the Ensembl VEP: https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#input

ADD REPLY
0
Entering edit mode

That is true. I just prefer the pseudo-VCF format because of the custom ID column that makes it easier to map input back to output.

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6