Question

Variant calling for a polyploid species

0

Entering edit mode

7.2 years ago

User000 ▴ 750

Dear all,

I am working with a tetraploid species. I have RNA-seq for varieties and a reference genome. I am going to do variant calling, possibly to study intra-varietal and inter-homoeologous SNPs.

Is it OK to do variant calling for a every variety separately is I have on average around 30K-50K reads (from read depth plotted over chromosome) ?
Because I am not sure what analysis I can do in case of cohort (joint) variant calling.
when doing variant calling how can I deal with the problem of homoeologous sequences? (A & B subgenomes ie.)? I am using -hisat2/tophat for alignment -GATK/FreeBayes for variant calling

Any help, a link to a publication etc is appreciated!

RNA-Seq SNP • 3.8k views

ADD COMMENT • link updated 7.2 years ago by Omics data mining ▴ 260 • written 7.2 years ago by User000 ▴ 750

score 4 · Answer 1 · 2018-02-21

4

Entering edit mode

7.2 years ago

Nicolas Rosewick 11k

You could start by looking at GATK best practices for variant calling from RNA-Seq :

https://github.com/gatk-workflows/gatk3-4-rnaseq-germline-snps-indels

In brief :

Alignment with STAR
SplitNCigarReads from GATK
BQSR from GATK
Haplotype caller GATK
Some filtering

ADD COMMENT • link 7.2 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Hi Nicolas thank you! I already did all of these steps. But I have very specific questions I stated above.

ADD REPLY • link 7.2 years ago by User000 ▴ 750

0

Entering edit mode

Ok. Maybe you should edit your question by adding the analysis you already did.

ADD REPLY • link 7.2 years ago by Nicolas Rosewick 11k

0

Entering edit mode

I did state the programs I am using. It is not so difficult to find the best practice etc, however I think it is not so straightforward when dealing with the polyploid species. So I wanted to know if there are some particular parameters to use or filtering etc. Thank you anyway

ADD REPLY • link 7.2 years ago by User000 ▴ 750

score 0 · Answer 2 · 2018-02-21

0

Entering edit mode

7.2 years ago

Kritika ▴ 270

Is it OK to do variant calling for a every variety separately is I have on average around 30K-50K reads (from read depth plotted over chromosome) ? it is fine to do varinat calling but my concern is how much depth you can select to filter it out the low depth reads as you have RNA seq data. Usually it always good to do WGS for Variant calling

when doing variant calling how can I deal with the problem of homoeologous sequences? (A & B subgenomes ie.)? I am using -hisat2/tophat for alignment -GATK/FreeBayes for variant calling Use BWA for SNP calling but you have RNA seq data i would say go with Hisat and freebayes for variant calling You classify your reference into A and B sub genome for example chromosome A1-A3 and B1-B3 total chromsome 6 in total so in case any homoelogous reads in mapping on A will also may have map on B so you will have clear picture of this.

ADD COMMENT • link 7.2 years ago by Kritika ▴ 270

0

Entering edit mode

Hi thanks! My chromosomes are already divided in A and B, but when I do variant calling is it nto going to result in false calls? due to the fact that it will map both on A & b? How to deal with this?

ADD REPLY • link 7.2 years ago by User000 ▴ 750

0

Entering edit mode

See this is reason i said for SNP calling use BWA or Bowtie because this tools will map the reads with high confidence and accuracy But you have RNA seq data so it is good to map with hisat because hisat and tophat will span exon and intronic region and map the reads in protein coding region.

ADD REPLY • link 7.2 years ago by Kritika ▴ 270

score 0 · Answer 3 · 2018-03-15

0

Entering edit mode

7.2 years ago

Omics data mining ▴ 260

Hello

You can check SNiPloid https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3791807/

This article can help you to go further in order to predict homeologus SNPs .

I hope it will help you.

ADD COMMENT • link 7.2 years ago by Omics data mining ▴ 260