Fasta Conversion
3
3
Entering edit mode
13.0 years ago
Syawash ▴ 30

Hi there, Is there away change identifiers in a fasta file. For examplt from

>fastsdde135667667
actgcagtctga
>fgdte12875
actggact

to

>Seq1
actgcagtctga
>Seq2
actggact
fasta • 2.2k views
ADD COMMENT
6
Entering edit mode
13.0 years ago

use awk:

awk '/^>/ { printf(">Seq%d\n",(++i)); next;} { print }' < input.fa > output.fa

Ex:

echo ">fastsdde135667667
actgcagtctga
>fgdte12875
actggact" | awk '/^>/ { printf(">Seq%d\n",(++i)); next;} { print }'

>Seq1
actgcagtctga
>Seq2
actggact
ADD COMMENT
0
Entering edit mode

Hi Pierre. Just curious if you can add padding with zeroes simply with awk. Eg: seq1 --> seq0001, seq253 --> seq0253. Happy New Year :)

ADD REPLY
0
Entering edit mode

@Eric, yes that works like the std C printf: printf(">Seq%03d\n",(++i))

ADD REPLY
0
Entering edit mode

@Pierre, nice! Thanks. Have to learn more C and C++ some time.

ADD REPLY
1
Entering edit mode
13.0 years ago
Daniel ★ 4.0k

also, this:

#!/usr/bin/perl

$count =1;

while (<>){
        if (s/^>.*/>Seq$count/){;
        $count++;
        }
        print;
}

>Seq1
actgcagtctga
>Seq2
actggact
ADD COMMENT
0
Entering edit mode
13.0 years ago

With Biopieces and add_ident:

read_fasta -i input.fa | add_ident -k SEQ_NAME -p Seq | write_fasta -o output.fa -x
ADD COMMENT

Login before adding your answer.

Traffic: 1924 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6