I have a large csv file (1.7GB) containing sequences, and i have to provide a header to each sequence, so i did some thing like this with bash to do same:
*#/bin/bash
cat con_test.csv > out.out
for file in out.out; do
sed -e 's/^/>NZ_CP00000.1 volvox complete genome\n/' -i "$file" done*
my input files:
AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC
GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA
CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG
TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA
AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG
my output file after running this script. (NZ_CP00000.1)
NZ_CP00000.1 volvox complete genome AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC
NZ_CP00000.1 volvox complete genome GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA
NZ_CP00000.1 volvox complete genome CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG
NZ_CP00000.1 volvox complete genome TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA
NZ_CP00000.1 volvox complete genome AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG
Now i want to assign a different or unique value with the accession no. to all my sequences, so that the description line looks something like this: ( NZ_CP00000.1_000000001) and the unique value incremented for every time
>NZ_CP00000.1_000000001 volvox complete genome AAAAAAAATGTGCTCCGGCCTCCGCGAAATTCGCGACGCCGGCCGCGTGGGCATGCACGTC
>NZ_CP00000.1_000000002 volvox complete genome GGCCGTTACCTGGAGCCAGCGGGACTCGAAGGATGCCCCACGATGAGTTCAGCAGCAATGA
>NZ_CP00000.1_000000003 volvox complete genome CCAAGCCTGCGCGTGCCCTGCGTGGTTCTTCCCCACAGCAGCACACCGTGAGGGCAAACTG
>NZ_CP00000.1_000000004 volvox complete genome TCGCCGCACGTTCGGGCAAAAAAACCTGACGTGCGCGGTCTTGTAAAGCGGTTAGTCACCGA
>NZ_CP00000.1_000000005 volvox complete genome AGGGCACGCGGGGCCGATTCGCACCGGCCGAGGTCTGCCCAAGGCAACCCCTAGAGTCTAG
how can i achieve this?
thank you so much it works..
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.