Extracting variations in the gene regions and from 100 bp of gene boundary from multiple VCF files
1
0
Entering edit mode
3.3 years ago
VenGeno ▴ 100

Hi,

I sincerely hope that I am not repeating an already answered question. I couldn't find the answer to my exact problem.

I have three VCF files derived using bcftools (isec). Those three files contain similar variations compared to the reference sequence. End of the day, I have

  • Three VCF files representing three varieties (include only the common variations)
  • Reference FASTA file
  • Annotation (gff3) file for reference.

What I want to do is extract variations found in;

  1. Gene region
  2. 100 bp from TSS/+1 and the stop codon

Please note this is a 5 MB region (not a whole-genome, so there are no chromosomes).

I appreciate it if someone can help me in this regard. Thank you!

VCF Variations • 743 views
ADD COMMENT
0
Entering edit mode
3.3 years ago
Tm ★ 1.1k

You can try using variant annotation tools like snpeff. It will add gene-related information (exonic, intronic, intergenic etc) in your vcf file.

ADD COMMENT

Login before adding your answer.

Traffic: 2435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6