Hi everyone,
I am currently taking an online tutorial on Galaxy and I have managed to complete some of the steps. However, the last one demands working with the final VCF file to determine the amount of SNPs, INDELs, MNVs, etc. I have no idea how to do that. I have already tried a few things googled a little bit, but unfortunately, I haven't got anywhere. I also need to get the names of the genes with the largest number of polymorphic sites.
Any ideas on how to extract this information out of a VCF file? I have even tried with Excel! Although I know that isn't very professional... besides of being quite difficult and impractical to do (not to mention that not even then I have managed to filter this info).
Please any ideas? I will surely give them a try!
Ty!
I don't see the problem... I paste VCF files into Excel, and it works fine for me. However, doing something like finding the names of the genes with the largest number of polymorphic sites sounds pretty random and not very useful, so it's unlikely that there are tools for it.
If you will routinely need specific pieces of information from large VCF files, it might pay to read the VCF specification and learn a scripting language like Python so that you can write custom queries. That will greatly expand your power and the scope of what you can accomplish.
Thank you for the help, Brian Bushnell! Sure, I agree that working with VCF files on Excel is not a problem, I just wanted to make it in a more automated way with a script, as emulating working with a bunch of files. For that, I should improve my Python first. I managed my way through Excel in the end ;-)