Hi,
I'm trying to use data from the 1001 Genomes Project to solve a problem. There are over a 1000 lines that have already been sequenced, mapped, and analyzed with SnpEff. I want to use this data to see all the variants in a certain gene. I found the file 1001genomes_snp-short-indel_only_ACGTN_v3.1.vcf.snpeff.gz located at the link https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snpeff_v3.1/
Basically, it is a very large file with all the variant information. All the line IDs are listed at the beginning of the file. After this, the variants are listed. However, looking at the variants, I don't know which specific lines showed the change and which did not. For those who are familiar with SnpEff (which I am new to), is there a way to obtain this information? Am I looking in the right place? Is there some alignment and analysis work I should do on my own?
Any help is appreciated.
http://snpeff.sourceforge.net/SnpEff_manual.html#databases Read through & please let me know if you could figure out.
I figured it out. I got base pair numbers of interesting mutations from the main file and used grep to find each number in the individual vcf files (non-snpeff files), which I downloaded from the site. Grep reads back the file name if it finds the string in it, if you grep through multiple files as once (using asterisk).