Add a sequence name!
1
1
Entering edit mode
8.7 years ago
fufuyou ▴ 110

Hi, I get my sequence files like this

TAACAGTGGGTCGAGATAAGGAC    1
CGCCCGGACGTCAGAGAAAAAGGTGCCAGCGCGGCGCAGAGAATGGAATT    1
TTAGTATGACGGGCTGACACGAGA    3

I want to add a sequence name at each sequence file as like:

>seq000001 1
TAACAGTGGGTCGAGATAAGGAC
>seq000002 1
CGCCCGGACGTCAGAGAAAAAGGTGCCAGCGCGGCGCAGAGAATGGAATT
>seq000003 3

Thanks, Fuyou

RNA-Seq • 1.6k views
ADD COMMENT
1
Entering edit mode

You seem to enjoy using custom-made formats :P

Where does the sequence name data come from Fuyou?

ADD REPLY
0
Entering edit mode

Hi John, Thanks, The seq name only need randomly name. It like I have wrote name, seq000001,seq000002,seq000003. Thanks

ADD REPLY
4
Entering edit mode
8.7 years ago
venu 7.1k

Assuming every sequence is of one line and you need to create the sequence headers

wc -l sequences.fa

Lets say you've got 100

for i in {1..100}; do echo "Seq000"$i >> ids.txt; done

paste ids.txt sequences.fa | awk '{print $1"-"$3"\t"$2}' | tr '\t' '\n' | sed 's/^/>/;n' | sed 's/-/ /g' > new_file.fa

Output

>seq0001 1
CAGTGGGTCGAGATAAGGAC
>seq0002 1
CGCCCGGACGTCAGAGAAAAAGGTGCCAGCGCGGCGCAGAGAATGGAATT
>seq0003 3
TTAGTATGACGGGCTGACACGAGA
ADD COMMENT
4
Entering edit mode

Playing awk-golf:

awk '{printf ">seq%06d %d\n%s\n" , NR,$2,$1}' sequences.fa > new_sequence.fa

This takes the line number (NR) as the number for the seq-id, paste the last number in the fasta-header, and prints the sequence in a separate line.

All under the assumption, that you have per line one sequence followed by the number.

ADD REPLY
0
Entering edit mode

Intelligent way of using awk.

ADD REPLY
0
Entering edit mode

Thanks Michael, It is great. Fuyou

ADD REPLY
0
Entering edit mode

This is great venu - thank you :)

ADD REPLY

Login before adding your answer.

Traffic: 2989 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6