Dear All,
I would like to know your comments about an issue. I have two transcriptome assemblies (two samples) of a species. Conditions are the same for the two assemblies. But;
- Number of genes and transcripts are different. 2.When I cluster two assemblies by cd-hit and vsearch tools, I found % 35 similarities between two species ( I mean %35 of sequences in first transcriptome are found in second transcriptome, and protein clustering is also almost the same).
- When I map RNA reads of first sample to second transcriptome, I found %99 mapping ratio. When I map RNA reads of second sample to first transcriptome, I found ~ %98 mapping ratio.
What I want to learn from you is that why sequence clustering ratio is very low.
We believe that this two samples belong to a species ( gender may be different).
Thank you.
Is it likely that both samples were processed through different assembly pipelines?
are you sure both samples are from the same species? Could there be contamination? Extract a 1kb section from a gene in both samples.... and run a blast search on them both. Do they both return the same species?
Based on COX1 gene sequencing, two species are the same. Besides, mapping ratio of RNA reads of each species to transcriptome of each species (cross mapping) is very high (over 95 %). These suggested us that these two species are the same. But clustering two transcriptome showed 35 % , meaning 35% of sequences of first species are found in transcriptome of second species. So I am confused.