Hello,
I am new to WGS and am struggling to word my google searches properly. I have 100 haploid samples, their bams, 7million SNPs, and a vcf file. I would like to generate a table that lists individuals as rows, SNPs as columns, and either a one or zero to denote which allele the individual has.
What I've tried:
I've tried using ANGSD to get genotype likelihoods, but it takes too long to run even with the HPC:
echo "angsd -b ALL_bams.txt -ref $REF -out myresult \
-uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 0 -C 50 -baq 1 \
-minMapQ 20 -minQ 20 -minInd 65 -setMinDepth 7 -doCounts 1 \
-GL 2 -doGlf 4 -dohaplocall 1" > call_gl
I've been looking into GATK but my HPC don't have the right version of java.
Exception in thread "main" java.lang.UnsupportedClassVersionError:
org/broadinstitute/hellbender/Main has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 52.0
I tried installing Java VM version 17 (recommended by GATK) and made the HPC very unhappy...
Error occurred during initialization of VM
Unable to allocate 131072KB bitmaps for parallel garbage collection for the requested 4194304KB heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Any and all advice is welcome! Thank you all
look at the option -Xmx of java, use the cluster itself instead of the login node
you basically want a transposed multisample VCF file (x=variant,y=sample)
Thank you so much! You were absolutely right on both counts :)
Hi, is this user account also yours: evelyn.abbott