Question

Help with vcf annotation

0

Entering edit mode

18 months ago

HoWI • 0

Hi everyone, I intend to add rsids to my dantelabs vcf and later merge it with 1240k dataset via plink. I have done it before on my laptop with an older version of dbsnp file (138) , but snp overlap with 1240k dataset was not good. I wanted to try again with the latest dbsnp file (156) but as the uncompressed file is whopping 165gb in size so it is not possible to use my laptop. I am unfamiliar with usegalaxy but still tried to annotate my vcf with bcftools on usegalaxy the resultant file had no rsids.

Can someone please instruct me regarding this?

Also I am getting this error message-

“INFO/RS value encountered and set to missing at NC_000001.10:6319593”.

Snpsift appears to be tailor made for this but I get this error message with it-

“Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3308)
at org.snpsift.annotate.VcfIndexDataChromo.grow(VcfIndexDataChromo.java:103)
at org.snpsift.annotate.VcfIndexDataChromo.add(VcfIndexDataChromo.java:46)
at org.snpsift.annotate.VcfIndex.add(VcfIndex.java:67)
at org.snpsift.annotate.VcfIndex.loadIntervals(VcfIndex.java:245)
at org.snpsift.annotate.VcfIndex.index(VcfIndex.java:183)
at org.snpsift.annotate.DbVcfSorted.open(DbVcfSorted.java:55)
at org.snpsift.annotate.AnnotateVcfDb.open(AnnotateVcfDb.java:395)
at org.snpsift.SnpSiftCmdAnnotate.annotateInit(SnpSiftCmdAnnotate.java:190)
at org.snpsift.SnpSiftCmdAnnotate.annotate(SnpSiftCmdAnnotate.java:70)
at org.snpsift.SnpSiftCmdAnnotate.run(SnpSiftCmdAnnotate.java:410)
at org.snpsift.SnpSiftCmdAnnotate.run(SnpSiftCmdAnnotate.java:397)
at org.snpsift.SnpSift.run(SnpSift.java:588)
at org.snpsift.SnpSift.main(SnpSift.java:76)”

Vcf dbsnp annotation • 1.2k views

ADD COMMENT • link 18 months ago by HoWI • 0

score 1 · Answer 1 · 2024-01-05

1

Entering edit mode

18 months ago

Pierre Lindenbaum 166k

but snp overlap with 1240k dataset was not good.

are you sure you're using the same build (hg19 vs hg38 ) ? are you using the same chromosome notation than in the dbsnp file (chr1 vs 1, chr1 vs NC_000001, etc... )

I wanted to try again with the latest dbsnp file (156) but as the uncompressed file is whopping 165gb in size

latest human i see is 'only' 24G under https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/

and you shouldn't need to uncompress it

ADD COMMENT • link 18 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks. I tried to isolate the 1st column ('cut' function on usegalaxy) of both my vcf and dbsnp file and yes dbsnp uses different chromosomal notation as compared to my vcf's. Can anything be done for this (some replace function on usegalaxy perhaps?) As for uncompression, it appears to be automated on either usegalaxy's or bcftools' part.

ADD REPLY • link 18 months ago by HoWI • 0

1

Entering edit mode

Replacing the Chr names and position notions in vcf

ADD REPLY • link 18 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks a lot. Also this page was very helpful. https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&chromInfoPage=

ADD REPLY • link 18 months ago by HoWI • 0