Hi,
Recently, we have sequenced our RNA samples ranging from 70- 200 nt by Illumina Hiseq platform. Here are the details:
Type of seq: Small RNA seq (size 70 - 200 nt)
Seq platform: Illumina Hiseq 2000
Read length: 50, Single end
Conditions: Normal (3 replicates) vs Patient (3 replicates)
Reads:~17 million reads in each replicate
Aim: To identify the 3' end extended sequences (polyAs) in the snoRNAs (unmatured snoRNAs)
Is there any particular pipeline? or Is there any particular tool to find them?.
Any other suggestions, Please let me know
Sethu
Hopefully others will reply with a premade tool, but I would think the general idea would be to first perform mapping as normal and then take the unmapped reads and split them to allow anchoring. You'd then try to map the anchors. The 3' extension would be the sequence of the remainder of an anchored read that maps on the 3' end. This is sort of how tophat works, though it'd make more sense to simply write a custom pipeline than to modify tophat.
Thank you for your prompt reply,
Can you please explain how to split and anchor them?
Sethu
Whatever program you write/find would take each read and segment it into non-overlapping stretches (of maybe 15-20 bases each). In this context, anchoring would be performed by simply mapping these segments to either the genome or a library of small RNAs (this is probably more efficient).