I have detected SNPs in whole genomes by GATK and I have .vcf files. my question is how I can extract SNPs from this gene (I do not know the name of this gene but I have a fasta file of 3kb which corresponds to my gene of interest). could you give me the vcftools command to extract the snps from this specific region. Thank you for your help
If you want to detect SNPs on a specific region I advise you to do a SNP calling on all genomes (aligning reads with a mapper like bwa, then use a variant caller, as freebayes).
Then you can extract the SNPs of the specific region you are interested in with vcftools.
HI guillaume.rbt
I have detected SNPs in whole genomes by GATK and I have .vcf files. my question is how I can extract SNPs from this gene (I do not know the name of this gene but I have a fasta file of 3kb which corresponds to my gene of interest). could you give me the vcftools command to extract the snps from this specific region. Thank you for your help
That is important information which you should have provided in your first post. Please update your question and be as precise as possible. You just wasted 4 hours by not explaining your issue sufficiently.
I recovered the position of my gene by blast, it is in chromosome 2 (2: 2241384-2244383). I search on vcftools to a commade that allows me to study the snp in this region but I have not found. can you help me
If your FASTA has metadata in its record header that points to its location on the genome, you can use that directly to map any SNPs to it via BEDOPSbedmap and vcf2bed:
Replace chr2 with 2, depending on the format of chromosome name in your snps.vcf file. This could either be UCSC (chr2) or Ensembl (2), most likely.
The file answer.bed will contain the 3k nt interval and a listing of all SNP ID values that map to — or associate with, or overlap — that interval.
If you don't have metadata in the FASTA header that tells you where you are, you could use a BLAST search on the sequence to get back the location of your sequence for your genome of interest.
Then you run the command above, again replacing what goes into echo with whatever region comes out of the BLAST search.
Hi H.Hasani. as I said to guillaume. I have detected snp of whole genomes (fungi). then that interests me to look directly snp of a gene of 3kb. I have a fasta file of this genes.
I can add the opion -L followed by this fasta file of 3kb of this gene ???
No you cannot, the -L option takes the genomic coordinates of your region of interest. Since you already did SNP calling there is no point in repeating this.
It is unclear which data you have available. Please elaborate and be precise.
it is done . I redid the question