I have downloaded high quality (made from both long 454 and short reads) reference transciptome for my species from population A. I also have illumina reads from my experiment (few replicates, some treatments and control) all done in population B.
I aligned the reads to the reference with bwa-mem. Few things bother me:
Only about 70% of reads aligners to the reference. So I dont know what is the rest 30%.
I see many SNPs in all samples from population B while looking at selected genes in IGV. (by eye roughly 10 SNP per 100bp). I suspect this could interfere with allignment algorithm.
It may be, that my treated plants have released expression of some genes that were not expressed in the reference plant and hence are not present in the reference. These genes (if they exist) would be of special interest for me.
I would like to combine information from downloaded reference trancriptome and to produce new reference tailored to my population but I am not sure if it is possible. I am searching for a tool that would help me do it. My naive idea is that it would map reads to reference change SNPs and other differences according to reads from population B. Then I would try to de novo assemble the un-aligned reads.
My search so far has yielded:
tools for de novo transcriptome assembly - If I understand correctly, I could use it on data from my population but info from high quality reference would be lost
tools for Reference-based transcriptome assembly - here the reference seems to be genome not transcriptome, so not possible for me
tools for genome reassembly - I guess similar to what I want but works only for genomes
Am I just bad at searching or such tool doesn't exist?
I think you should try trinity's genome guided assembly option. in bam option you can use your bam file came from transcriptome allignment. other option can be spades-rna. py (but it is recomended for bacterial genomes)