Analysis of genotype g.vcf
0
0
Entering edit mode
3.8 years ago

I have a genotype gvcf of 30GB which is causing a RAM issue during analysis. I was wondering if there is a way to get around the size issue and run the genotype gvcf in R or do I need to chunk the gvcf for analysis. Is Python better than R for analysis of bigger gvcf files?

somatic mutations vcf genotype gvcf R Python • 1.3k views
ADD COMMENT
0
Entering edit mode

You perhaps should mention what you want to do with it, in order to get more accurate advice. In general, people will use genotypeGVCFs to make a normal (much smaller) VCF.

ADD REPLY
0
Entering edit mode

I am trying to remove certain elements from the gvcf file, as a preprocessing part for further somatic mutation analysis.

ADD REPLY
0
Entering edit mode

to remove certain elements

Okay, if you can't be more specific with what you need then I can't help.

ADD REPLY
0
Entering edit mode

In the previous smaller gvcf of individual samples. I expanded the vcf and removed <non-ref> elements from the file, creating a subset of vcf, required for further analysis. I am having trouble loading the genotype gvcf because of huge memory required, thus can't perform the above mentioned steps.

ADD REPLY
0
Entering edit mode

Not sure what you're trying to do - but trying to load a 30Gb into memory in R is never going to work. Much better to use something that streams the file through memory like bcftools.

ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6