Hi, I have a very big vcf file (11.8 GB), the header and first row look like this:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 13372 . G C 608.91 PASS "AC=3;AC_AFR=0;AC_AMR=0
How can I need access the #CHROM and POS columns?
Note that I cannot view it in excel because it's too big. I have also tries the following, but none worked:
#1
> library(VariantAnnotation)
> vcfFile = system.file(package="VariantAnnotation", "extdata", "ExAC.r1.sites.vep.vcf.gz")
> scanVcfHeader(vcfFile)
Error in .io_check_exists(path(con)) : file(s) do not exist:
''
#2
> vcf<-readVcf("ExAC.r1.sites.vep.vcf.gz","hg19")
Error: cannot allocate vector of size 54 Kb
Any help is highly appreciated
I would do such task using Linux command line as discussed below, but If you really need to read it in R you can use
fread
fromlibrary(data.table)
awk 'BEGIN{OFS="\t"}{if(!"^#"){print $1,$2}}' <(gzip -dc yourfile.gz) | gzip > output.txt.gz