Question

Low sequence similarity between two samples of the same species

0

Entering edit mode

7.9 years ago

Mehmet ▴ 820

Dear All,

I would like to know your comments about an issue. I have two transcriptome assemblies (two samples) of a species. Conditions are the same for the two assemblies. But;

Number of genes and transcripts are different. 2.When I cluster two assemblies by cd-hit and vsearch tools, I found % 35 similarities between two species ( I mean %35 of sequences in first transcriptome are found in second transcriptome, and protein clustering is also almost the same).
When I map RNA reads of first sample to second transcriptome, I found %99 mapping ratio. When I map RNA reads of second sample to first transcriptome, I found ~ %98 mapping ratio.

What I want to learn from you is that why sequence clustering ratio is very low.

We believe that this two samples belong to a species ( gender may be different).

Thank you.

gene sequencing sequence Assembly alignment • 1.6k views

ADD COMMENT • link updated 7.9 years ago by h.mon 35k • written 7.9 years ago by Mehmet ▴ 820

0

Entering edit mode

Is it likely that both samples were processed through different assembly pipelines?

ADD REPLY • link 7.9 years ago by Sej Modha 5.3k

0

Entering edit mode

are you sure both samples are from the same species? Could there be contamination? Extract a 1kb section from a gene in both samples.... and run a blast search on them both. Do they both return the same species?

ADD REPLY • link 7.9 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

Based on COX1 gene sequencing, two species are the same. Besides, mapping ratio of RNA reads of each species to transcriptome of each species (cross mapping) is very high (over 95 %). These suggested us that these two species are the same. But clustering two transcriptome showed 35 % , meaning 35% of sequences of first species are found in transcriptome of second species. So I am confused.

ADD REPLY • link 7.9 years ago by Mehmet ▴ 820

score 0 · Answer 1 · 2017-04-27

0

Entering edit mode

7.9 years ago

Mehmet ▴ 820

No, the same process was applied for the two assemblies.

ADD COMMENT • link 7.9 years ago by Mehmet ▴ 820

score 0 · Answer 2 · 2017-04-29

I think at least part of this discrepancy may be explained by differential alternative splicing and/or contamination with introns between your samples. I know cdhit, with its default settings, will not cluster alternative transcripts which differ by an internal exon present at only one transcript, for example. However, mapping would be very high if using a local aligner, such as BWA. So I don't see 1) and 2) at odds with each other.