likelihood of number of consecutive Ns in miRNA sequence
1
0
Entering edit mode
4.6 years ago
realnewbie ▴ 30

Hi, Dear community,

I have a very interesting question. I am planning to count the number of sequences with greater than some consecutive number of nucletodies (i.e number of sequences with >=5 Gs) as empty product. Do you have any idea about the likelihood of consecutive Ns in miRNA sequences?

ps: I am not asking 25% expected ratio of each base. I am asking the likelihood of presence of GGGGG like patterns in short miRNA sequences?

Many thanks, Best regards,

mirna likelihood • 785 views
ADD COMMENT
0
Entering edit mode

Why should there be any N's in sequence for a run that has worked well?

ADD REPLY
1
Entering edit mode

I think the OP means any mononucleotide kmer?

ADD REPLY
1
Entering edit mode

Ah yes. That must be it.

ADD REPLY
1
Entering edit mode
4.6 years ago

this should be as simple as downloading the hairpins or mature sequences and looking for your patterns

perl -ne '$x=5;if(/>/){if ($seq){$seq =~ m/((A{$x,}|C{$x,}|G{$x,}|U{$x,}))/ && print $def.$1."\n";}$seq="";$def=$_;}else{chomp;$seq=$seq.$_;}' hairpin.fa 
>cel-mir-38 MI0000009 Caenorhabditis elegans miR-38 stem-loop
UUUUUU
>cel-mir-41 MI0000012 Caenorhabditis elegans miR-41 stem-loop
UUUUU
>cel-mir-42 MI0000013 Caenorhabditis elegans miR-42 stem-loop
UUUUU
>cel-mir-49 MI0000020 Caenorhabditis elegans miR-49 stem-loop
AAAAA
>cel-mir-51 MI0000022 Caenorhabditis elegans miR-51 stem-loop
AAAAA
ADD COMMENT

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6