As a follow-up to an earlier question, I am trying to figure out how to use seqkit to change duplicated read names
Unfortunately, the documentation and help for rename are very limited, and I don't see how the example listed can be applied to a case like mine:
https://bioinf.shenwei.me/seqkit/usage/#rename
Specifically, if I have a fastq sequence where two different reads have the same name, what do I do to append an _N to the second occurrence of a name?
In other words, given file.fastq, what input and output arguments do I need to apply so that seqkit rename <some arguments> file.fastq <some arguments> outfile.fastq
Returns outfile fastq with duplicate names changes to <name>_N
What have you tried? Create a model input from your FASTQ (with ~10 unique reads and maybe 1-2 duplicates) and test against it - the manual is pretty straightforward on how to use the tool. Pro-tip: Use the
-n
just in case. Try the-n
with the examples below to see the difference:Note that
seqkit rename
will not give you<full_name_line>_N
but<id_part>_N <rest_of_name_line>
.