I am trying to expand a set of SNPs (identified by their rsIDs), to include all SNPs in linkage disequilibrium (LD) with the input snps.
I am able to run the script of LD computation from this tutorial as well as the script on this page.
I therefore assume I have installed ensembl correctly on my computer.
I used this as a starting point for my code of getting all SNPs in LD. The code I am now testing is the following. It is run with an input file of SNP rsIDs.
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous'
);
my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation' );
$variation_adaptor->db->use_vcf(1); # To get 1000G phase 3 data also
while (<>) {
chomp; # Remove \n from input file line names
my $variation = $variation_adaptor->fetch_by_name($_);
print $variation->stable_id(), "\n";
my @vfs = @{ $variation->get_all_VariationFeatures() };
foreach my $vf (@vfs){
print "get ld values\n";
my $ld = $vf->get_all_LD_values();
print "get ld variations\n";
my @ldvs = @{ $ld->get_variations() };
print "for each ld variation\n";
foreach my $ldv (@ldvs) {
print $ldv->stable_id();
}
}
}
But when I try to run it, I get the following printout. It seems the failing line code is my $ld = $vf->get_all_LD_values
, based on the printouts preceding the error:
$ perl ldCalculation.pl ldSNPs.txt
rs157580
get ld values
Use of uninitialized value $gt[1] in hash element at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 716
.
.
Use of uninitialized value in string ne at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 617.
.
.
Number of genotypes supported by the program (500) exceeded
.
.
Can't call method "get_all_VariationFeatures" on an undefined value at /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm line 870, <OUT> line 29120.
.
in the terminal means the line above is repeated multiple times.
Could you please help me with running the script properly? I am fairly new to perl and the ENSEMBL api, so I might have done some basic coding errors (did try calling the variation functions with and without () with same result).
Also, I have rsIDs in sorted order right now. Would the script also work with rsIDs in an arbitrary order?
Alternative ways of expanding the original set with SNPs in LD are also of interest! Some of the api calls seem extremely slow, and the original SNP set might be quite large, 20k+ for instance.
you could also convert to plink and run ld pruning with plink 1.9