Fastest way to get rsid from SNP position
1
0
Entering edit mode
4 months ago

Hi! I've been trying to use bedtools to annotate SNPs (get rsid for each SNP), however this seems to need a lot of memory and my jobs always get killed.

bedtools intersect -a ${file} -b ${dbsnp_file} -wa -wb > out.txt

Where dbsnp_file is the entire dbsnp database already sorted.

Any other tools? BioMart and VEP cannot handle the amount of variants I need them to (>100k variants)

snp dbsnp annotation • 387 views
ADD COMMENT
0
Entering edit mode

if 'input' is a vcf file:

bcftools annotate  -a "dbsnp.vcf.gz" -c ID input.vcf.gz
ADD REPLY
0
Entering edit mode

they're both in bed format (input and output), how can I convert to vcf?

ADD REPLY
0
Entering edit mode
4 months ago

this seems to need a lot of memory

sort both files and use bedtools intersect -sorted

f you are trying to intersect very large files and are having trouble with excessive memory usage, please presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files) and then use the -sorted option. This invokes a memory-efficient algorithm designed for large files.

ADD COMMENT

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6