Hello,
I need to call SNPs from pooled RNA-seq data. In each pool, I have 2 or 3 individuals. I am following the GATK pipeline for this purpose.
What I wonder about, and this could be due to my lack of understanding of the SNP calling process from pooled data, if for a given position, one sees two alleles, how could one say whether this position is a heterozygote in one of the individuals or whether these two different alleles come from two homozygote individuals?
I would appreciate some comments on this. Thanks.
Thanks for your comment. From individually sequenced RNA-seq data is possible to see whether an individual is homozygote or heterozygote for a given position especially one can always check the bam files against the reference. But as you said, knowing whether a position is homozygote or heterozygote in pooled samples is impossible, the only information is that this position contains variation in a populations.
No, you can't always tell even in singly sequenced samples. You can call heterozygous sites in those, but not homozygous, since you could also be looking at allele-specific expression. That's among the many issues with using RNAseq for variant calling.
Oh, yes, you are right. As one allele may not be expressed, so, then the truly heterozygote position will be called as a homozygote. Well, I should handle the limitations. Thanks again.