Extracting snps from a range of coordinates
4
2
Entering edit mode
6.1 years ago
S AR ▴ 80

I have a list of genes. and i want to extract the snps and indels from my VCF file (that i generated using GATK pipeline ) from genes coordinates on . The list of genes coordinates:

Gene Name   Accession_no.   Start_Position   End_Position   Strand 
Rv0194         NC_000962.3       226878         230462            +

I was looking bedtools but it is asking for .bed format of genes nd as well .bed of bam files. how to do it ? or any other options/tools/scripts?

Like i tried tabix:

bgzip ERR038736_UnifiedGenotyper_variants_raw_snp.vcf
tabix ERR038736_UnifiedGenotyper_variants_raw_snp.vcf.gz
tabix ERR038736_UnifiedGenotyper_variants_raw_snp.vcf.gz AL123456.3:226878-230462 > Rv0194

and this gave me the variants like this:

AL123456.3      227098  .       T       C       6730.77 .       AC=2;AF=1.00;AN=2;DP=172;Dels=0.
AL123456.3      228069  .       G       A       7132.77 .       AC=2;AF=1.00;AN=2;BaseQRankSum=-
AL123456.3      228168  .       G       C       6682.77 .       AC=2;AF=1.00;AN=2;DP=171;Dels=0.

But this is not a vcf file and i can only extract it one at a time. I want to extract all variants against a list of coordinates and store it in a vcf output.

Can anyone help me it this?

vcf SUB-SET SNP gene coordinates • 4.0k views
ADD COMMENT
4
Entering edit mode
6.1 years ago

Hello,

you can convert this list into a valid bedfile by:

$ cut -f2-4 genes.txt|tail -n+2 > genes.bed

You can than take this bed file with tabix

$ tabix input.vcf.gz -R genes.bed

fin swimmer

ADD COMMENT
0
Entering edit mode

can you please explain what -f2-4 is doing? and what -n+2 is doing

ADD REPLY
0
Entering edit mode

Sure :)

  • cut -f2-4 select the columns 2 to 4 which contain the Accession_no., start and end position
  • tail -n+2 prints all from the second line onward. This is necessary to get rid of the header line.

fin swimmer

ADD REPLY
0
Entering edit mode

Oh great it superb thanks ..

ADD REPLY
1
Entering edit mode

Hello angelshiza,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work. This will aid others who have similar problems in the future.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

it doesnot give the output in vcf format and i want the vcf file in the end

ADD REPLY
1
Entering edit mode

Is this a comment to my answer?

tabix will output a vcf. If you like to output the header data as well type:

$ tabix -h input.vcf.gz -R genes.bed

fin swimmer

ADD REPLY
0
Entering edit mode

Oh that is great thank you!

ADD REPLY
3
Entering edit mode
6.1 years ago

BEDOPS:

$ vcf2bed --insertions < snps.vcf > insertions.bed
$ vcf2bed --deletions < snps.vcf > deletions.bed
$ vcf2bed --snvs < snps.vcf > snvs.bed

https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html

ADD COMMENT
0
Entering edit mode
6.1 years ago

Check bedtools intersect angelshiza

ADD COMMENT
0
Entering edit mode
6.1 years ago
Tm ★ 1.1k

If you have the gtf/gff3 file of gene annotation along with variant's .vcf file then I will suggest you to annotate your variants using snpeff as it will not only provide information on which SNPs belongs to which gene but it will also determine the effect of the variants.

ADD COMMENT
0
Entering edit mode

No i dont have gff/gft. I have multiple genes with coordinate can i make gff/gft of many genes in one file ?? Can you share if you know.

Thank you Sar

ADD REPLY
0
Entering edit mode

How you have identified the genes? means have you used any specific tool for gene prediction?

ADD REPLY
0
Entering edit mode

i have the gene list with coordinates:

Gene Name   Accession_no.   Start_Position   End_Position   Strand 
Rv0194         NC_000962.3       226878         230462            +
Rv3695         NC_000962.3       226888         230452            +
Rv1144         NC_000962.3       66878         98462            +
ADD REPLY

Login before adding your answer.

Traffic: 1786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6