Entering edit mode
7.1 years ago
Kritika
▴
270
Hi All
Recently i got whole genome seq data of allotetraploid genome(Plant) generated from illumina
the basic pipleine which i am following is
1)Fastqc (Quality check)
2)Trimming if required
3)Mapping against reference
4)Snp Call using GATK or Samtools
5)Snp prirotization using SnpEff
My query is what all check points i need to make to get better results. and how I will be interpreting snps from allotetrapoid genome from vcf file. If any experienced answer will be highly appreciated
Thank You!!!
If it is an allotetraploid (is it durum wheat? :P) then you treat each homoeologous genome as a diploid. Think carefully about trimming, though. You want to remove adapter contamination and then filter low quality reads. Especially if you have a highly repetitive genome, you don't want to lose too much length in order to not generate misaligned reads. The most quality trimming should be to get rid off < phred 3 bases on the ends. For calling you can also use freebayes. And for SNP effect prediction (and thus prioritization) you can also have a look into VEP - Variant Effect Predictor.
Hi @cschu181 How should i treat homoelogous as diploid i have reference genome of samples (Cotton AADD type ).
In the case of allotetraploids, you're not dealing with 4 copies of the same chromosome. Instead, you treat A and D chromosomes as different entities, with 2 copies each. In practice, this means that if your SNP Caller allows you to set ploidy, then you can set it to (or leave it at) diploid. For, say, potato, you'd set ploidy to 4, as it is a (non-allo)-tetraploid.
So you meant to say in my freebayes tool I have to set ploidy parameter as 2 rather 4 right !!
Yes, and in GATK Haplotype Caller you can leave the default setting of 2.