Entering edit mode
3.8 years ago
nikitanaik224
▴
20
I have a genotype gvcf of 30GB which is causing a RAM issue during analysis. I was wondering if there is a way to get around the size issue and run the genotype gvcf in R or do I need to chunk the gvcf for analysis. Is Python better than R for analysis of bigger gvcf files?
You perhaps should mention what you want to do with it, in order to get more accurate advice. In general, people will use genotypeGVCFs to make a normal (much smaller) VCF.
I am trying to remove certain elements from the gvcf file, as a preprocessing part for further somatic mutation analysis.
Okay, if you can't be more specific with what you need then I can't help.
In the previous smaller gvcf of individual samples. I expanded the vcf and removed <non-ref> elements from the file, creating a subset of vcf, required for further analysis. I am having trouble loading the genotype gvcf because of huge memory required, thus can't perform the above mentioned steps.
Not sure what you're trying to do - but trying to load a 30Gb into memory in R is never going to work. Much better to use something that streams the file through memory like bcftools.