How can I use a short sequence as a probe to fish out sequences have the same part out from raw illumina paired reads files?
0
0
Entering edit mode
3.4 years ago

Dear all,

I have 150bp paired end sequences R1 and R2, after I clean the sequences I used sortmerna to sort out all the reads matching rna databases. so I have two fasta files containing reads like this:

@NB551184:334:HKHYYAFX2:1:11101:8313:12479 2:N:0:0
AAGGTGATATGAACCGTTATAGCCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGACACACTATCATTACGTGAATACATAGCGTAATGAAGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAG
+
AAAAAEEEEEEEEEEEAAEEEEAEEEA<EAEEEEEEEEAEEEEEEEAAEE<E<AEAEEA/EEAEEAEEEEE6EEAEAEEEAEEEEEE/EEEE<EEEAEA<EEE/EE/E//<AAEAAEEE/EEAA<<EAAAE<EE6A<//EEEEEA<EEA/
@NB551184:334:HKHYYAFX2:1:11101:26630:12486 2:N:0:0
TAAAAAGCTCGTAGTTGAACCTTGGGCCTGGCTGGCCGGTCCGCCTCACCGCGTGCACTGGTCCGGCCGGGCCTTTCCCTCTGGGGAACCGCATGCCCTTCACTGGGTGTGTCGGGGAACCAGGACTTTTACTTTGAACAAATT
+
AAAAAEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEAEEEE<EEEEEAEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA/EAEEEEEEEEEAEEEEEAA<EEEEEAAEEA/EEEEEEEEAAEEEEAAAEEEEAEE<

And I have four sequences(each is 25bp, with minor differences) that I am interested in and I want to know from all the rna 150bp reads, what's the ratio between these four sequences.

How could I do this?

The original idea here is that: my bacterium has 6 16S, and each one is slightly different from each other, so we want to know under different conditions, how do those different 16S expressed. And we found the differences of these 16S are mainly at a short sequence (25bp), like below:

  1. CATATCTGACCGGCATCGGTTGGAT
  2. TATATCTGACCGGCATCGGTTGGAT
  3. TATATCTGACCGGCATCGGTTAGCT
  4. CATATCTGACCGGCATCGGTTAGCT

If we make contigs using all these reads then it could mess up with the combination, of T or C at the beginning, or G.A or A.C at the very end. So I decided to used these four 25bp sequences as probe to fish out those reads matching them, and count the number of reads.

Can anyone help with the code or software I can use?

Any idea and help will be highly appreciated!

Thanks
Yanfang

rna paired • 1.0k views
ADD COMMENT
0
Entering edit mode

What have you tried? Surely you started with grep?

ADD REPLY

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6