Entering edit mode
8.8 years ago
Gabe Anderson
▴
10
Hello,
I have a data set that looks like this:
>JAMESBROWN_1_FC20423AAXX_7_1_82_883
GTTAGAGGTTCGAAG
>JAMESBROWN_1_FC20423AAXX_7_1_198_886
GGCTCAGTGGTCTAGTGGTATGATTCTCGCTT
>JAMESBROWN_1_FC20423AAXX_7_1_115_888
GGGGGTGTAGGGTGGGGTTGG
>JAMESBROWN_1_FC20423AAXX_7_1_99_894
GTTCGTATCCCACTTCTGACACCA
>JAMESBROWN_1_FC20423AAXX_7_1_226_900
GCAAACTGTGCGTCATCGTGT
And I'd like to edit it to look like this:
>cel1_count=3
TGCCTTGTCTGTCCTAAAAATC
>cel2_count=9
GTTAAGTGGGAAACGATGT
>cel3_count=7
CCGACCTTGAAATACCAC
>cel4_count=7
TAGAAATCCACTATGCTTTGG
>cel5_count=5
CGCGGGTGAGCAGCCTGGTAGCTCGTC
Count in the header line specifies the number of times a sequence occurs in the data set. Kindly assist. Thanks!
Thanks for your input. For some reason, the reads are also altered instead of the header line only. A lot of bases are replaced by A. Here is what my result looked like:
The command is not doing anything to your sequences other than counting.I guess, as they sorted, the sequences with "A" appeared first in your output. The order is not maintained.
Thank you, Pierre and Goutham! It's clear now.