Variant calling from 5 MB regions coming from contrasting cultivars
0
0
Entering edit mode
3.3 years ago
VenGeno ▴ 100

Hi,

I would like to compare ~5 MB genomic (QTL) regions across two groups (resistant and susceptible) and identify variants that might majorly influence resistance. I was thinking of the following pipeline;

  • use susceptible cultivar as the reference (since there is only one)
  • Assemble tolerant groups (3 cultivars) using the above as a reference
  • Call variants

Since I won't be using reads here, can you suggest the most appropriate tools I can use for the purpose? I wasn't sure whether it is appropriate to use a small reads assembly and variant calling pipeline for this purpose (esp since those ask to give seq. platform). I really appreciate any help you can provide.

calling alignment variant • 1.1k views
ADD COMMENT
0
Entering edit mode

What do you mean by assembling the 3 cultivars using the reference? What kind of variants are you looking for?

ADD REPLY
0
Entering edit mode

Hi Samuel,

Let's say the susceptible one is "A" and resistant ones are "B" and "C." I was thinking of using A as the reference, then assemble B and C against it. What I want is to identify both SNPs and InDels found in B and C compared to A. So I can see whether there is an association between phenotype and these variants. I hope I made it clear. Thank you!

ADD REPLY
1
Entering edit mode

Ok so de-novo assembly of each cultivar right? You have long reads?

Well if you are interested in just SNPs and INDELs then paftools/minimap2 works well. It will also give you large INDELs/SVs. Other tools, MUM&Co (my own) and similarly SyRi will give a full range of SVs.

I hope the assembly process is simple (homozygous, non-repetitive) and you have a polishing pipeline. If so perhaps you have short reads and you can call variants using them also and compare.

ADD REPLY
0
Entering edit mode

Thank you. I will check out the tools you suggested. Sorry if I confuse you again. That's the part I am also not so sure about since its neither long reads nor short reads but a genomic region coming from already sequenced cultivars) The sequences I have are QTL regions of already assembled genomes (coming from the rice 3000 genome project). I retrieved the QTL region (5 MB stretch) from each cultivar and like to compare (Susceptible A vs resistant (B, C). We have strong clues to infer that our resistance is associated with this region. So the idea is to find out potential variations. Then look at their potential impact in silico followed by wet lab studies.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6