Comparison and Visualization of VCF derived from Two Genomes (tolerant and susceptible) #DNA-Seq
1
0
Entering edit mode
7.0 years ago

Hi Friends,

I already have two VCF files generated using SAMtools (susceptible and tolerant) of Heavea (only draft genome is available and it is not very well annotated). I would like to;

  • compare two and
  • point out possible SNPs

that might responsible for their differential response to the pathogen. So far I am stuck and would really appreciate if you can suggest a GUI tool and instructions that I can use. I visualized them using IGB . Unfortunately now I am stuck there.

Thanks in advance,

Venura

WGS DNAseq VCF Variant • 1.9k views
ADD COMMENT
1
Entering edit mode
7.0 years ago

Hi Venura!

The question about the genetic background of pathogen resistance in Hevea brasiliensis or a rubber tree is intriguing. However, it is not a question about bioinformatics. I am afraid, there is no method that will give you the answer about pathogen resistance given two vcfs and a reference genome.

The article about reference genome (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008842/) says that they have identified 473 genes related to the disease resistance. Maybe you better start with a study of transcriptional difference between two varieties you have? One you have at least RNA-seq of 3 resistant plants and 3 controls, you might see which genes are differentially expressed. Then you might check whether any of 473 genes from the article are differentially expressed in your experiment. If so, you might start asking the question about the causes of DE. Maybe it will be just the different copy number of certain genes, maybe some SNPs. If there was no intersection between the set of 473 and your DE set, maybe you discovered something new about resistance!

If you already have WGS of 2 varieties, you may check the copy number of R genes in the two genomes. If there is a difference, you might check the expression level of those genes in the lab.

Good luck!

Sergey

ADD COMMENT
0
Entering edit mode

Hi Sergey,

Thank you so much for the advice. As you suggested we are planing to do RNAseq and qPCR in future. I have both bam and vcf files. I visualized them in IGV but I don't understand how to proceed from there. As an example, let's say I want to work with those 473 genes. I can't find an easy way to search and extract variations (in two samples) for those genes rather than doing it manually. Also, to do the CNV part. If possible, can you please suggest some tools to achieve that efficiently? Thank you again, Cheers, Venura

ADD REPLY
0
Entering edit mode

Hi Venura!

1.If you aligned your reads to the reference genome, you have their annotation - where the genes are (coordinates), something like a bed file:

chr start stop gene

1 100 200 gene1

1 300 500 gene2

Then you may use bedtools to extract variants for particular regions: http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

2.After samtools you only have small variants (SNPs and indels) in your vcf file. To call CNV you have to rerun calling with another tool. You may try manta (https://github.com/Illumina/manta), cnvkit (https://github.com/etal/cnvkit).

Sergey

ADD REPLY

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6