Question

how to calling variants in RNAseq data from multiple samples?

0

Entering edit mode

7.4 years ago

Lila M ★ 1.3k

Hi everybady! I would like to calling variants in RNSeq multiple samples. Let's say that I have two different group and I would like to know the differences among them. Each study group is formed by ~ bam files. Is the first time that I do variant calling, I've also read some post as Workflow Or Tutorial For Snp Calling? with a very clear workflow. I would like to know if I have to merge all the bam files for each group in a unique sam file after mark the duplicates. It should be correct? or should I use other specific pipeline for this analysis?

Thank you very very much in advance!

RNA-Seq GATK VC variant calling • 3.2k views

ADD COMMENT • link updated 7.4 years ago by Samuel Brady ▴ 330 • written 7.4 years ago by Lila M ★ 1.3k

score 4 · Answer 1 · 2017-07-13

4

Entering edit mode

7.4 years ago

andrew.j.skelton73 6.6k

See the GATK "best practise" for RNA Seq data here. Fair warning, that you're trying to do something that the data was never intended for. If you have SNPs in mind that you want to look at, I'd strongly recommend just looking at the alignments in IGV and eyeballing them. This protocol is a lot of work to get formal variant calls that won't offer you much more than eyeballing. Also if you're looking to find post transcriptional modifiers, then identifying them against noise is extremely difficult.

ADD COMMENT • link 7.4 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

So the only way is to call variant in each sample separately and then eyeballing one by one? Does not exist other way to do variant calling in a whole population? Neither with other program?

Thank you!

ADD REPLY • link 7.4 years ago by Lila M ★ 1.3k

1

Entering edit mode

I'm not saying to call variants and then eyeball, I'm saying to look at your RNA Seq alignments in IGV and see in whatever bit you're interested in if something lights up.

You apply tools like the haplotype caller on a sample by sample basis, however if you have a decent number of samples you can do joint calling, although I don't recall that being benchmarked for RNA seq data.

If you're asking "What's my causal variant based on al my samples", that depends on your hypothesis. There are tools such as VEP to predict the effect of variants. If you want to do mendelian inheritance or something along those lines, take a look at GEMINI, MendelMD, or take a look at the tool list available at omicstools

ADD REPLY • link 7.4 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

and what about call variants independently and the merge the vcf file?

Thanks for all the information!

ADD REPLY • link 7.4 years ago by Lila M ★ 1.3k

0

Entering edit mode

You can use VCFTools to make a multisample VCF, see the vcf-merge function

ADD REPLY • link 7.4 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

I was thinking on it. So this approach should work, after following the GATK "best practise" for RNA Seq data, right?

ADD REPLY • link 7.4 years ago by Lila M ★ 1.3k

0

Entering edit mode

If you make a VCF for each run of the Haplotype Caller (i.e. each sample), then you can combine them to a multisample VCF using VCF-tools

ADD REPLY • link 7.4 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

yes, that's the idea! Thanks for the confirmation!

ADD REPLY • link 7.4 years ago by Lila M ★ 1.3k

score 1 · Answer 2 · 2017-07-15

1

Entering edit mode

7.4 years ago

Samuel Brady ▴ 330

We have successfully used UNCeqR to call mutations in RNA-Seq data. The first author if this paper is very responsive to questions if you need help with it. The github page is also very helpful.