change seq name in a fasta file with a dataframe
1
0
Entering edit mode
6.6 years ago
Chvatil ▴ 130

I got a problem, I explain the point.

I have one fasta file such:

>seqA
AAAAATTTGG
>seqB
ATTGGGCCG
>seqC
ATTGGCC
>seqD
ATTGGACAG

and a dataframe :

seq name      New name seq
seqB            BOBO
seqC            JOHN

and I simpy want to change my ID seq name in the fasta file if there is the same seq name in my dataframe and change it to the new name seq, it would give:

New fasta fil:

>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG

Thank you very much

pandas python fasta • 4.9k views
ADD COMMENT
0
Entering edit mode

Outside R :

Export your data frame out and remove the headers (let eg file be test.txt). From the above example, following would be test.txt (tab separated)

seqB    BOBO
seqC    JOHN

Run following command on example fasta file above:

$ seqkit replace -p '(.+)' -r '{kv}' -K -k test.txt test.fa > test2.fa

output:

$ cat test2.fa 
>seqA
AAAAATTTGG
>BOBO
ATTGGGCCG
>JOHN
ATTGGCC
>seqD
ATTGGACAG

Download seqkit from here: http://bioinf.shenwei.me/seqkit/download/

ADD REPLY
0
Entering edit mode

Thanks for your help but is there a solution with python?

ADD REPLY
3
Entering edit mode
6.6 years ago
Chirag Parsania ★ 2.0k

Can be done by R Biostrings library

library(Biostrings)

## load fasta file into R 
inFasta <- readAAStringSet("aminoAcid.fasta") ## for amino acid fasta
inFasta <- readDNAStringSet("dnaSeq.fasta")  ## for dna fasta

## get seq names from fasta 
fa_given_names <- names(inFasta)

## prepare data frame, 
df <- data.frame(seq_name = names(inFasta) , new_name = paste(names(inFasta),"_new",sep = ""))

## assign new seq names  by mapping fasta seq name to data frame names
names(inFasta) <- df[match(fa_given_names , df$seq_name) , "new_name"]

## write data to fasta file with updated names
writeXStringSet(inFasta , "fa_with_new_headers.fa")
ADD COMMENT
0
Entering edit mode

Thanks your for your help but do you think it is possible on python3? Indeed I'm using it for my pipeline.

ADD REPLY
1
Entering edit mode

Toto26,

I see that you've mentioned the python tag in your post, and it is generally recommended that python3 be used as the default python. Beyond this, there is no way for anyone to connect your question to your requested solution framework. It is advisable to add these details to the body of your post when you create the post (especially in your case where you seem to know what you want to use) - this ensures others invest the precious time they have in the right direction.

Either that, or you can use their solution/algorithm adapted to python3, which should not be huge deal. It can also serve as a nice exercise, IMO.

ADD REPLY
0
Entering edit mode

If that is important then you should have mentioned that from the beginning.

ADD REPLY

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6