how to calling variants in RNAseq data from multiple samples?
2
0
Entering edit mode
7.4 years ago
Lila M ★ 1.3k

Hi everybady! I would like to calling variants in RNSeq multiple samples. Let's say that I have two different group and I would like to know the differences among them. Each study group is formed by ~ bam files. Is the first time that I do variant calling, I've also read some post as Workflow Or Tutorial For Snp Calling? with a very clear workflow. I would like to know if I have to merge all the bam files for each group in a unique sam file after mark the duplicates. It should be correct? or should I use other specific pipeline for this analysis?

Thank you very very much in advance!

RNA-Seq GATK VC variant calling • 3.2k views
ADD COMMENT
4
Entering edit mode
7.4 years ago

See the GATK "best practise" for RNA Seq data here. Fair warning, that you're trying to do something that the data was never intended for. If you have SNPs in mind that you want to look at, I'd strongly recommend just looking at the alignments in IGV and eyeballing them. This protocol is a lot of work to get formal variant calls that won't offer you much more than eyeballing. Also if you're looking to find post transcriptional modifiers, then identifying them against noise is extremely difficult.

ADD COMMENT
0
Entering edit mode

So the only way is to call variant in each sample separately and then eyeballing one by one? Does not exist other way to do variant calling in a whole population? Neither with other program?

Thank you!

ADD REPLY
1
Entering edit mode

I'm not saying to call variants and then eyeball, I'm saying to look at your RNA Seq alignments in IGV and see in whatever bit you're interested in if something lights up.

You apply tools like the haplotype caller on a sample by sample basis, however if you have a decent number of samples you can do joint calling, although I don't recall that being benchmarked for RNA seq data.

If you're asking "What's my causal variant based on al my samples", that depends on your hypothesis. There are tools such as VEP to predict the effect of variants. If you want to do mendelian inheritance or something along those lines, take a look at GEMINI, MendelMD, or take a look at the tool list available at omicstools

ADD REPLY
0
Entering edit mode

and what about call variants independently and the merge the vcf file?

Thanks for all the information!

ADD REPLY
0
Entering edit mode

You can use VCFTools to make a multisample VCF, see the vcf-merge function

ADD REPLY
0
Entering edit mode

I was thinking on it. So this approach should work, after following the GATK "best practise" for RNA Seq data, right?

ADD REPLY
0
Entering edit mode

If you make a VCF for each run of the Haplotype Caller (i.e. each sample), then you can combine them to a multisample VCF using VCF-tools

ADD REPLY
0
Entering edit mode

yes, that's the idea! Thanks for the confirmation!

ADD REPLY
1
Entering edit mode
7.4 years ago
Samuel Brady ▴ 330

We have successfully used UNCeqR to call mutations in RNA-Seq data. The first author if this paper is very responsive to questions if you need help with it. The github page is also very helpful.

ADD COMMENT
0
Entering edit mode

Define 'successfully' ?

ADD REPLY

Login before adding your answer.

Traffic: 2874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6