cat myseq.fasta | grep '>' | head -n 5
>PZ7180000000004_TX nReads=26 cov=9.436
>PZ7180000031590 nReads=3 cov=2.59465
>PZ7180000027934 nReads=5 cov=2.32231
>PZ456916 nReads=1 cov=1
>PZ7180000037718 nReads=9 cov=6.26448
The IDs represent a unique identifier; those that have the _
have been grouped by sequence similarity based on BLAST results. For example,PZ7180000000447_A
and PZ15399_A
are in the same group, whilePZ7180000000079_AF
and PZ5729_AF
are together in a different group. Those without any _
suffix had no homology to other sequences, and so weren't put in a group.
I want to put non-group in file named no_suffix.fasta
I want to do this all using AWK, Grep and SED. Please help
Why only awk, sed and grep? Is this an assignment?
No I have studied those I also understand bioawk... I tried python but it didn't worked for me was tough :(
It might be a good idea to use bioawk then. You should not have to design yet another parser for an ad hoc task.