Windows-Bases Software Packages Which Can Analyze Vcf Files
6
0
Entering edit mode
12.5 years ago
user56 ▴ 300

I would like to work with VCF files. Select one person, subset one gene or chromosome or chromosome part.

I tried VMware and Ubuntu and VCFtools and GATK and tabix but I run into a lot of errors. I don't have a mac. So for the 90% of us who still have to use windows:

What are alternatives to tools like VCFtools GATK or tabix which work under windows directly?

(I will take none as an answer, but want to be sure)

vcf windows • 16k views
ADD COMMENT
0
Entering edit mode

why this post has been closed?

ADD REPLY
0
Entering edit mode

I agree the post should not have been closed, this is a valid bioinformatics question - closing should only be applied to a post on which no further discussion is needed. In general it should be reserved to flamewars or other similar situations that go out of control.

ADD REPLY
0
Entering edit mode

Hi, On Windows you can now use the Windows Subsystem for Linux, which eliminates the need of setting up a virtual machine. See https://docs.microsoft.com/en-us/windows/wsl/install

ADD REPLY
3
Entering edit mode
12.5 years ago
JC 13k

VCF is just a simple tab-delimited text format, you can use Perl, Python, Ruby, R or any other computer language to read and filter the file.

ADD COMMENT
0
Entering edit mode

the file is 7.8GB big. R chokes on it. I need tabix for windows and get the file from ftp site. this answer does not help me.

ADD REPLY
1
Entering edit mode

sorry to hear that, last week I parsed all the VCF in the 1000Genomes using a Perl script computing the population frequencies, I didn't load the files into memory, I just read and processed line by line.

ADD REPLY
2
Entering edit mode

JC: You are right. We use perl every day for heavy lifting with no problems. Just takes some careful thinking. Then again everything in bioinformatics does.

ADD REPLY
0
Entering edit mode

thanks for the support

ADD REPLY
1
Entering edit mode

Don't read the whole file into memory. Go line-by-line, grab what you need, and operate on it.

ADD REPLY
0
Entering edit mode

@JC, that was probably a lot of coding when you could have used a tool for that. All I need is a subset some genes and splitt the files into one file per person and I don't want to download files larger than 5 GB. You could do that on linux but not on windows. again, Is there windows counterpart of this:

tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 16:56995835-57017756 > genotypes.vcf
ADD REPLY
2
Entering edit mode
12.5 years ago

Install cygwin or a virtualization system then you can use most unix based tools.

ADD COMMENT
0
Entering edit mode

I do have cygwin installed. What I meant with the question was specialized tool (not generic language like Perl). Is there vcftools for windows? is there gatk for windows? There is a biostar question which states that vcftools would not compile under cygwin/windows.

ADD REPLY
1
Entering edit mode
12.5 years ago
user56 ▴ 300

There are no good alternatives under windows which would have pre-packaged functionality like vcftools or gatk. (comment with the tools (excluding generic languages!) which prove this wrong.

EDIT LATER:

  • I found BAMseek to be able to handle 7GB VCF file.
  • PSEQ team mention that one day there might be Windows based implementation
  • Bioconductor (from comments)
ADD COMMENT
0
Entering edit mode

[R] - bioconductor

ADD REPLY
0
Entering edit mode

Thanks. This looks promising.

ADD REPLY
1
Entering edit mode
12.5 years ago

In order to work with NGS results in VCF format, a software package that works nicely on Windows (it's java based, so it would work nicely on any OS) would be VarSifter

If you're planning to query a VCF file with a lot of information summarized on it, like 1000genomes or dbSNP files, I would encourage you to read again carefully the VCFtools and tabix manuals, since they are the most appropriate programs to use when dealing with VCF files. they are very simple and straightforward to use, so if you have errors with them they should be easy enough to solve them, easier than programming your own code. if you still run into errors, programming your own code would be the definitely the only option left for you, but I would again suggest you to try not to reinvent the wheel.

ADD COMMENT
0
Entering edit mode

Varsifter freezes when you open a 7 GB VCF file

ADD REPLY
0
Entering edit mode

and it makes sense that it does. I guess you are allowing enough memory to the java runtime through the program launcher, and that it still isn't able to handle that large file. VarSifter behaves like a typical data editor would behave: it lets you browse the data easily allowing to perform some filters you may be interested in. it doesn't make sense to open a 7GB file on any kind of editor. instead, you should consider dealing with that file using VCF tools or any other scripting approximation that would allow you to filter and collect the data needed from that file without loading the 7GB data file on memory.

ADD REPLY
1
Entering edit mode
11.2 years ago
zhanxw ▴ 20

I compiled tabix and bgzip in Windows, and you can download here.

ADD COMMENT
0
Entering edit mode
11.2 years ago

I agree with the above answers Perl, Python, Ruby should do the trick. Also if you want a GUI to analyze VCF files check VarSifter.

I haven't used it but is in Java so it should work on Windows machines.

ADD COMMENT

Login before adding your answer.

Traffic: 3010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6