I have aligned RNA-Seq reads with STAR and I want to genotype the samples using RNA-Seq reads. I have genotyped from RNA-Seq experiments before, but it was aligned with tophat2. But this time the size of the dataset is quite too big to run tophat2...
The SplitNCigarReads tool that is primarily meant for splitting a read with N cigar into individual exon segments and hard clipping of any overhanging reads into the intronic regions, also allows you to reassign the mapping quality. The below command has been taken from GATK website. But it will only work with MAPQ. Why would you want to reassign base qualities? If you meant base quality recalibration, BQSR in GATK can be used.
Yes, I have quite a bit of experience :-). But I don't know what exactly you want me to talk about. Genotyping usign RNAseq works fine but you are interrogating less than two percent of the whole genome. In short, you may not be able to get enough markers/genotypes to perform a association study with much precision. We perform RNAseq genotyping to check for strain and sex assignment errors that is pretty common when you do big experiments involving 200-500 mice. Mendelian genes show clear segregation and help to rectify the strain assignment. For sex, you may use genes like Xist (Female) or Ddx3y (Male).
Here is a text from GATK website "For RNA-seq, we evaluated all the major software packages that are specialized in RNAseq alignment, and we found that we were able to achieve the highest sensitivity to both SNPs and, importantly, indels, using STAR aligner". I have never compared them but I believe that both STAR and Tophat2 will give you more or less same results for genotyping.
The methods are comparable but for RNA-seq you should use a splice aware aligner and also try SplitNCigarReads that I have talked above. You should also not use soft clipped bases for variant calling to minimize the false positive variants for obvious reasons. All these steps are part of best practices for RNAseq variant calling as described by GATK here (https://www.broadinstitute.org/gatk/guide/article?id=3891). May be you haven't seen it.
If you want to genotype from RNA-seq data, I suggest you use BBMap, which is more sensitive to both SNPs and indels than STAR or TopHat2. Genotyping from RNA-seq is of course not the same as DNA, as you're never assured of getting equal coverage from alleles, or indeed any coverage from one allele. Or, indeed, any coverage from most genes.
Thanks a lot! That's really helpful post!