Hi Guys, I intend on acquiring transcripts (cDNA) from certain human-cell line culture. I'd be acquiring all the reads using the MinION. The reads are expected to be 2-10kb (don't know yet the distribution).
From all the reads I obtain, I'd like to move forward to analysis with only certain read:
- The reads which will have a certain known tagging sequence (45 bases on the 3' end).
- The reads which will have and expected, known, open read frame (~2.5kb).
Obviously, I'd have to optimize the balance between sensitivity and specificity when targeting these reads.
Is there a tool designed for these specific types of screening? If not, would you suggest to write a code from scratch or revise an already somewhat-similar code?
I'm a novice to UNIX but have knowledge in coding (MATLAB, C), so keep that in mind please when answering.
Thanks.
It's probably important to note that the data is noisy and therefore exact matching of the known tagging sequence won't give you the best results. You'll probably need to do some fuzzy matching of your "recognition sequence".
Since you are looking for cDNA sequences I'm not sure why the second requirement would be an issue?