I am looking for a method to remove paralogous alignment from aligned RNAseq data. I have already applied mapQ filters.
The main way to distinguish the paralogous in genome resequence data is to look for the depth of the coverage and select for the regions that have more than 97.5th percentile or 4 times the SD coverage from the mean coverage. But, this method cannot be applied to RNAseq data.
In contrary I was thinking if the number of alleles at a locus can be used as a method to detect the paralogous reads in RNAseq data. Reason: in a diploid genome at any site there cannot be more than 2 alleles except when there is sequence error. So, if there are more than 2 alleles at a site with equal coverage
1) this could help us identify the potential paralogous alignment site.
2) The next step would then to find the reads that are paralogous vs. non-paralogous.
But, I have found no tools to work this out.
Has, any one had similar problems - solution.
Thanks,