Reverse step | From abundance matrix into "original" fasta file
1
0
Entering edit mode
7.6 years ago
fibar ▴ 90

Are there available tools out there to go from an abundance matrix into a sort of original fasta file, conserving somehow the same information? The file looks like this:

sequence   sample1   sample2   sample3   ...
actgg...   43        89        23        ...
actga...   03        53        19        ...

I also have identifiers for each sequence. The output would look like:

>sample1_readIDx
actgg...
>sample1_readIDx
actgg...
...
>sample1_readIDy
actga...

The first sequence should appear 43 times with a sample1 header, 89 times with a sample9, and so on.

next-gen amplicon-sequencing data • 1.3k views
ADD COMMENT
0
Entering edit mode
7.6 years ago

using awk:

 awk '/^sequence/ {split($0,header);next;} {for(i=2;i<=NF;++i) {N=int($i);for(x=0;x<N;++x) {printf(">%s_%d\n%s\n",header[i],NR,$1);}}} ' input.txt
ADD COMMENT
0
Entering edit mode

Thanks Pierre. It run. However, it didn't print the headers as I described it in my post. I only see an underscore followed by a number. Were you thinking of an additional step afterwards?

ADD REPLY
0
Entering edit mode

it didn't print the headers as I described it in my post

yes because I did not understand the nature of this header. Feel free to modify this simple awk script.

ADD REPLY

Login before adding your answer.

Traffic: 2672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6