Hi everybady! I would like to calling variants in RNSeq multiple samples. Let's say that I have two different group and I would like to know the differences among them. Each study group is formed by ~ bam files. Is the first time that I do variant calling, I've also read some post as Workflow Or Tutorial For Snp Calling? with a very clear workflow. I would like to know if I have to merge all the bam files for each group in a unique sam file after mark the duplicates. It should be correct? or should I use other specific pipeline for this analysis?
Thank you very very much in advance!
So the only way is to call variant in each sample separately and then eyeballing one by one? Does not exist other way to do variant calling in a whole population? Neither with other program?
Thank you!
I'm not saying to call variants and then eyeball, I'm saying to look at your RNA Seq alignments in IGV and see in whatever bit you're interested in if something lights up.
You apply tools like the haplotype caller on a sample by sample basis, however if you have a decent number of samples you can do joint calling, although I don't recall that being benchmarked for RNA seq data.
If you're asking "What's my causal variant based on al my samples", that depends on your hypothesis. There are tools such as VEP to predict the effect of variants. If you want to do mendelian inheritance or something along those lines, take a look at GEMINI, MendelMD, or take a look at the tool list available at omicstools
and what about call variants independently and the merge the vcf file?
Thanks for all the information!
You can use VCFTools to make a multisample VCF, see the vcf-merge function
I was thinking on it. So this approach should work, after following the GATK "best practise" for RNA Seq data, right?
If you make a VCF for each run of the Haplotype Caller (i.e. each sample), then you can combine them to a multisample VCF using VCF-tools
yes, that's the idea! Thanks for the confirmation!