Entering edit mode
13 months ago
sousapaulo16
▴
20
Hello all,
I am trying to import 100 vcf files (termination .recode.vcf
) into R without doing it one at the time.
All the files are in the same directory (~/Projects/VCF_output/)
and have similar names: Replicate_1.recode.vcf
; Replicate_2.recode.vcf
, etc.
I have been trying using for (variable in vector) {}
together with read.vcfR
function, after choosing my directory but without any success.
Does anybody have a suggestion?
Thanks in advance
You can do something like this in your
~/Projects/VCF_output/
. I have not tested it though.Thanks to Francisco Pina Martins I was able to import the 100 vcf files and preforming the several statistics that I intended.
The solution that he found involves a R script that runs with the bash command :
The R script is:
Can you explain what you are going to do with the 100 VCFs? Do you want to keep them as separate objects read into memory? Or do you want to append them into a single
vcfR
object?I will use a second R package
dartR
to compute basic statistics such as Expected heterozygosityThat doesn't answer the question though. As it would be more memory efficient to read in just one VCF, calculate stats, emit results, and then move onto the next. Do you need all the VCFs to compute these stats? Just trying to understand the problem.
No worries and sorry for the incomplete answer. I am afraid that I do need all the vcf files. Each one of them is independent data set. The idea is to compute several time the same statistics in for each file and save those values so I can compare the same statistics for each vcf file. So basically, each vcf file has to produce 2/3 statistics that ideally will be saved in a dataframe