Hi all,
I want to retrieval SNPs in a very large vcf file.
I stored the chromosome and position of my interested SNPs in a temp file as below:
chr1 2487663 rs2227312 C A 100.0 PASS DP=1825;ASP=true;CAF=0.4008,0.5992;COMMON=1;G5=true;G5A=true;GENEINFO=LOC115110:115110|TNFRSF14:8764;GNO=true;HD=true;KGPhase1=true;KGPhase3=true;R5=true;RS=2227312;RSPOS=24
I used to retrieval records from the source vcf with tabix by inputting a range:
tabix source.vcf.gz chr1: 222-245
But this time, since it a snp, I can only input a begin site:
tabix source.vcf.gz chr1: 2,487,663
or
tabix source.vcf.gz chr1:2,487,663-2,487,663
But it doesn't work. Furthermore, not all of SNPs have dbSNP ID, so, I cannot retrieval them by ID.
Could you give me some suggestions?
Thank you!
Thanks for your response. I have compressed and indexed the vcf file. A few first lines of the source.vcf.gz appear as below:
I try to retrieval above SNP from the source.cvf.gz:
But it doesn't return anything. Did I make any mistake?
yes it's "1:10019-10019" not "chr1:10019-10019"
I figure out the issue:
In the source vcf file, the content in the chromosome column is a plant number without prefix "chr". So, my command line should be:
It works well now. Thank you so much!