Hi all,
I have sequence some plasmid DNA using Nanopore sequencing (whole plasmid was sequenced). I am only interested in one region, therefore I would like to extract from the fasta file (around 10000 reads) just 425 bp from a specified location in each 4600 bp read (i.e. everything after atgacccg to be kept, or delete everything before this sequence).
Would someone be as kind as to help me do this? I've tried a few tools, but they fail with the long nanopore reads.
Many thanks in advance
Hi, Many thanks for this, it worked like a charm! If I could ask for your help just one more time, the data is a little bit noisy, and it appears that it is retaining reads that do not contain this sequence. Would you happen to know how to modify the above script to discard any reads that do not contain the specified sequence? Many thanks again zack
When you linearize, you could prefix the input to the sed command with a grep for
atgacccg
, i.e.That way only sequences matching
atgacccg
are retained and processed bysed
.