How To Reformat A Fasta File To A Custom Tabular Format
4
1
Entering edit mode
11.7 years ago
2011101101 ▴ 110

I have a fasta format file. It looks like. test.fa

>WP1_11_x10
CCATGCGCGGGTTCAATTCCTGTCGTTCGACCC
>WP1_16_x1
ACAAGCGAAGGCTCCTCAACGACGCCTCATCGGATG
>WP1_17_x1
AGAACAAGATTGTTGAAAACTTGAGGA
>WP1_21_x11
CCTGGGATGCGCAAGGAAGCTGAC

I want the below format.The number(10,1,1,11) is in the fasta header .

CCATGCGCGGGTTCAATTCCTGTCGTTCGACCC   10
ACAAGCGAAGGCTCCTCAACGACGCCTCATCGGATG    1
AGAACAAGATTGTTGAAAACTTGAGGA     1
CCTGGGATGCGCAAGGAAGCTGAC  11

Who can help me?Thank you very much!!!

perl awk format • 4.4k views
ADD COMMENT
4
Entering edit mode
11.7 years ago

hai 21.., Please refer to code below written in perl

  #! /usr/bin/perl -w


open(F1, '<', "$ARGV[0]") or die;

while (<F1>){

    if ($.%2 != 0) {

         @val = split(/\_/);
         $cnt = $val[1];
    }
    else{ chomp($_); print "$_ $cnt \n"; }
}

This code was not written in a defensive way, this should work normally with out any problems.

ADD COMMENT
0
Entering edit mode

Thank you ,but I 'm sorry i give you the wrong number ,I have edit it.I have got the answer,thank you

  #! /usr/bin/perl -w


open(F1, '<', "$ARGV[0]") or die;

while (<F1>){

if ($.%2 != 0) {

     @val = split(/\_x/);
     $cnt = $val[1];
}
else{ chomp($_); print "$_ $cnt \n"; }

}

ADD REPLY
0
Entering edit mode

Consider upvoting the answer, if it was useful for you.

ADD REPLY
3
Entering edit mode
11.7 years ago

Or at the command line:

more test.fa | perl -ane 'if(/>/){($dum,$x)=split(/x/,$F[0]);}else{print "$F[0]\t$x\n";}'
ADD COMMENT
3
Entering edit mode
11.7 years ago
Kenosis ★ 1.3k

Here's another option:

use strict;
use warnings;

my $num;
while (<>) {
    if (/>/) {
        ($num) = /(\d+)$/;
    }
    elsif ( $num and /\S/ ) {
        chomp;
        print "$_\t$num\n";
    }
}

Usage: perl script.pl fastaFile [>resultsFile]

The last, optional parameter will redirect output to a file.

Results on your dataset:

CCATGCGCGGGTTCAATTCCTGTCGTTCGACCC    10
ACAAGCGAAGGCTCCTCAACGACGCCTCATCGGATG    1
AGAACAAGATTGTTGAAAACTTGAGGA    1
CCTGGGATGCGCAAGGAAGCTGAC    11
ADD COMMENT
1
Entering edit mode
11.7 years ago

Here is how to do it with Biopieces www.biopieces.org):

read_fasta -i test.fa | split_vals -k SEQ_NAME -d 'x' | write_tab -k SEQ,SEQ_NAME_1 -x
ADD COMMENT

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6