Question

To identify 3'end extended small ncRNA from small RNA seq data

1

Entering edit mode

10.3 years ago

sethugunja ▴ 60

Hi,

Recently, we have sequenced our RNA samples ranging from 70- 200 nt by Illumina Hiseq platform. Here are the details:

Type of seq: Small RNA seq (size 70 - 200 nt)

Seq platform: Illumina Hiseq 2000

Read length: 50, Single end

Conditions: Normal (3 replicates) vs Patient (3 replicates)

Reads:~17 million reads in each replicate

Aim: To identify the 3' end extended sequences (polyAs) in the snoRNAs (unmatured snoRNAs)

Is there any particular pipeline? or Is there any particular tool to find them?.

Any other suggestions, Please let me know

Sethu

small-RNAseq ncRNA poly-A RNA-Seq • 3.4k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by sethugunja ▴ 60

0

Entering edit mode

Hopefully others will reply with a premade tool, but I would think the general idea would be to first perform mapping as normal and then take the unmapped reads and split them to allow anchoring. You'd then try to map the anchors. The 3' extension would be the sequence of the remainder of an anchored read that maps on the 3' end. This is sort of how tophat works, though it'd make more sense to simply write a custom pipeline than to modify tophat.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your prompt reply,

Can you please explain how to split and anchor them?

Sethu

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by sethugunja ▴ 60

0

Entering edit mode

Whatever program you write/find would take each read and segment it into non-overlapping stretches (of maybe 15-20 bases each). In this context, anchoring would be performed by simply mapping these segments to either the genome or a library of small RNAs (this is probably more efficient).

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-08-21

1

Entering edit mode

10.3 years ago

Chirag Nepal ★ 2.4k

You should start by mapping to genome as Devon suggested. Then identify how reads are mapped to the snorna, if that is your interest.

In my opinion, as your library size is 70-200nt, we have size selected to enrich for snoRNAs and snRNAs, but no mirnas or other small RNAs (like the ones derivided from end of snoRNAs). I am not aware, if you can find extended poly-A in snoRNAs, simply because the way how snornas are processed. Snorna are generally encoded in introns and excised by splicesome and there might not be polyA specifc to snoRNA. While the polyA of snoRNA host gene will be downstream of 3'UTR of host gene.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Chirag Nepal ★ 2.4k

1

Entering edit mode

Hi Chirag,

We know that there are polyAs in unmatured snoRNAs in patient becoz we have analysed the abundance of only polyA and total snoRNAs. The abundance of polyA snoRNAs were quite high in Patient vs Normal (qPCR data). This suggests that the proportion of unmatured snoRNAs are high in patient. So we did sequencing to identify the sequence and length of the polyAs in particular snoRNAs.

Here's the picture showing the maturation of H/ACA box snoRNAs.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by sethugunja ▴ 60

1

Entering edit mode

I've inlined and shrunk the image. I suspect that most of us have institutional access.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Devon Ryan 104k

score 1 · Answer 2 · 2014-08-21

1

Entering edit mode

10.3 years ago

Ido Tamir 5.2k

I would use an aligner that allows incomplete alignments like bowtie2 in local mode or bwa mem. Then filter by min alignment length, 5' or 3' soft-clipped sequences etc .... This however will require multiple runs to get the alignment parameters right. A training dataset would be of high value to reduce false negative rate or be overly sensitive. Take some known small RNAs and clip or extend them a little bit (5' and 3') and align this dataset to the genome or a non-coding RNA file and check if you could recover all of them.

ADD COMMENT • link 10.3 years ago by Ido Tamir 5.2k

0

Entering edit mode

Hi Ido,

As I m from the non bioinformatics, I couldnt understand fully. Could you please take time and explain me in detail?

Thanks

Sethu

ADD REPLY • link 10.3 years ago by sethugunja ▴ 60