Detection of SNPs
3
0
Entering edit mode
6.5 years ago
kamel ▴ 70

Hi Biostars,

I have detected SNPs in whole genomes by GATK and I have .vcf files. my question is how I can extract SNPs from this gene (I do not know the name of this gene but I have a fasta file of 3kb which corresponds to my gene of interest). could you give me the vcftools command to extract the snps from this specific region. Thank you for your help

SNP genome alignment gene • 2.3k views
ADD COMMENT
0
Entering edit mode

It is unclear which data you have available. Please elaborate and be precise.

ADD REPLY
0
Entering edit mode

it is done . I redid the question

ADD REPLY
1
Entering edit mode
6.5 years ago
guillaume.rbt ★ 1.0k

If you want to detect SNPs on a specific region I advise you to do a SNP calling on all genomes (aligning reads with a mapper like bwa, then use a variant caller, as freebayes).

Then you can extract the SNPs of the specific region you are interested in with vcftools.

ADD COMMENT
0
Entering edit mode

HI guillaume.rbt I have detected SNPs in whole genomes by GATK and I have .vcf files. my question is how I can extract SNPs from this gene (I do not know the name of this gene but I have a fasta file of 3kb which corresponds to my gene of interest). could you give me the vcftools command to extract the snps from this specific region. Thank you for your help

ADD REPLY
2
Entering edit mode

That is important information which you should have provided in your first post. Please update your question and be as precise as possible. You just wasted 4 hours by not explaining your issue sufficiently.

ADD REPLY
0
Entering edit mode

You need to get the position of your gene to extract the SNPs. For that you can do a blast with your fasta against your genome : https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch

Then with the positions you can use the --position parameters from vcftools to filter your vcf : http://vcftools.sourceforge.net/man_latest.html

ADD REPLY
0
Entering edit mode

I recovered the position of my gene by blast, it is in chromosome 2 (2: 2241384-2244383). I search on vcftools to a commade that allows me to study the snp in this region but I have not found. can you help me

ADD REPLY
2
Entering edit mode

second thought, bedtools intersect is even easier to use :

echo -e "2\t2241384\t2244383" > gene_position.bed
bedtools intersect -a ./your_vcf.vcf -b ./gene_position.bed > filtered_vcf.vcf
ADD REPLY
0
Entering edit mode

Thanks for your help guillaume.rbt

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode
6.5 years ago

If your FASTA has metadata in its record header that points to its location on the genome, you can use that directly to map any SNPs to it via BEDOPS bedmap and vcf2bed:

$ echo -e 'chr2\t2241384\t2244383' | bedmap --echo --echo-map-id-uniq --delim '\t' - <(vcf2bed < snps.vcf) > answer.bed

Replace chr2 with 2, depending on the format of chromosome name in your snps.vcf file. This could either be UCSC (chr2) or Ensembl (2), most likely.

The file answer.bed will contain the 3k nt interval and a listing of all SNP ID values that map to — or associate with, or overlap — that interval.

If you don't have metadata in the FASTA header that tells you where you are, you could use a BLAST search on the sequence to get back the location of your sequence for your genome of interest.

Then you run the command above, again replacing what goes into echo with whatever region comes out of the BLAST search.

ADD COMMENT
0
Entering edit mode
6.5 years ago
H.Hasani ▴ 990

I would simply use GATK HaplotypeCaller with -L option

ADD COMMENT
0
Entering edit mode

Hi H.Hasani. as I said to guillaume. I have detected snp of whole genomes (fungi). then that interests me to look directly snp of a gene of 3kb. I have a fasta file of this genes. I can add the opion -L followed by this fasta file of 3kb of this gene ???

ADD REPLY
0
Entering edit mode

No you cannot, the -L option takes the genomic coordinates of your region of interest. Since you already did SNP calling there is no point in repeating this.

ADD REPLY
0
Entering edit mode

Exactly, it wasn't quite clear in the question. I believe guillaume.rbt has given you the correct answer.

ADD REPLY

Login before adding your answer.

Traffic: 2214 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6