Rename fasta headers from CSV
0
0
Entering edit mode
5.0 years ago
MSRS ▴ 590

My csv files looks like (https://www.biostars.org/p/380879/###courtesy)

201200175|A|name1|175|2012

201200287|A|name2|287|2012

201200845|A|name3|845|2012

my fasta file looks like..

201200175

201200287

201200845

I want the output like...

201200175|A|name1|175|2012

201200287|A|name2|287|2012

201200845|A|name3|845|2012

sequence • 971 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

how can I edit this command for this type of data?

seqkit replace --kv-file <(sed "s/,/\t/g" changes.csv) --pattern "^(\d+)_(\w+)" --replacement "{kv}_\${2}" sample.fasta
ADD REPLY
1
Entering edit mode

in KV-file should have first column should have fasta headers (without >) and corresponding replacing value @ mdshaminurrahman95. seqkit doesn't work as such with CSV file furnished in OP @ mdshaminurrahman95

this should work if data is exactly as posted in OP:

input:

$ cat file.fa 
>201200175
atgc
>201200287
agtc
>201200845
atgc


$ cat file.csv 
201200175|A|name1|175|2012
201200287|A|name2|287|2012
201200845|A|name3|845|2012

output:

$ seqkit  replace --kv-file <(awk -v OFS="\t" -F "|" '{print $1,$0}' file.csv) -p '(.+)$' -r '{kv}' --quiet  file.fa

>201200175|A|name1|175|2012
atgc
>201200287|A|name2|287|2012
agtc
>201200845|A|name3|845|2012
atgc
ADD REPLY

Login before adding your answer.

Traffic: 1533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6