Question

Question about homologous concept in sequencing.

0

Entering edit mode

10.4 years ago

mangfu100 ▴ 810

Hi

While I am reading a paper name "An algorithm for Gene Fusion Discovery in Tumor RNA-Seq DATA" related to Gene fusion discovery , I have a trouble with its context.

Below is my difficult paragraph from paper and I need someone to understand the fuzzy concept.

I displayed in bold text which I cannot understand.

Previous work on gene fusion detection from RNA-Seq

FusionSeq has been used to identify fusions in prostate tumor samples and cell lines [10,11]. While the methods used for these studies are capable of identifying genuine gene fusions, many challenges and limitations remain in the analysis of RNA-Seq data. For example, the aforementioned studies only considered reads that align uniquely to the genome. However, errors in next generation sequencing together with homologous and repetitive sequences shared between genes often produce ambiguous alignments of the short reads generated in RNA-Seq experiments. While resolving the 'correct' placement of these reads is often not possible, we propose that ambiguously aligning reads provide important evidence of real gene fusions, and therefore should be leveraged by analysis methods

Firstly I am so confused about the homologous.

I already know the concept of the homologous, but I can't connect the concept into sequencing field.

Also why not het instaed of homo? Because I think, homologous is always same DNA in chromosome pair.

Secondly, What is the meaning of shared between genes?

I hope you understand because I am very beginner in this field.

Anyway I am looking forward to your reply.

Thank you!

sequencing RNA-Seq alignment gene next-gen • 2.6k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by mangfu100 ▴ 810

Ram · Accepted Answer · 2014-07-29

The clunky sentence that causes the confusion would be simpler if "homologous" was replaced with "parologous gene families with high sequence similarity" (quasi-repeats if you like)

The issue is the technical ability to select between real chimeric transcripts arising from chromosomal rearrangments in vivo, or as artifacts generated from the assembly contigs of RNA-seq data in silico. As said, aligning against the tumor genome may discriminate one from the other (on a good day)

score 2 · Accepted Answer · 2014-07-29

The main problem is, that you can not map each read uniquely or even once. Uniquely means at one position of the genome.

Here they notice two different reasons, the first one is the homologous genes and the second one is repetitive sequences. Especially the second case is quite obvious there are regions in the genome which are highly repetitive (some examples http://en.wikipedia.org/wiki/Repeated_sequence_(DNA)). If you now try to map your read against these you will have multiple matches! Therefore you can not easily decide which one is the correct one. And if they are shared between genes it gets even worse.

About homologous gene problem i am not sure, but two genes are homologus to each other if they have the same ancestor, but there sequence can vary. If you now map against your reference genome, you might can not map, because your reads are from an altered homologous gene.