Replace the sequence
2
0
Entering edit mode
8.1 years ago
22.venkat ▴ 10

Hi, I have two multiple sequence fasta files

file-1 I have 4000 fasta sequences with different IDs

file-2 I have 100 fasta sequences which IDs are already existed in file-1 but the sequences were different.

now i want to replace the sequence of file-1 with sequence of file-2 based on IDs can any one help regarding this

next-gen • 2.9k views
ADD COMMENT
4
Entering edit mode

What have you tried?

ADD REPLY
0
Entering edit mode

I understand the snarkiness when a question is trivial - but here it is misguided - this is not nearly as easy to solve as you imply. I don't think that there are 100 posts like this. The OP has a specific and peculiar problem - but then that is not surprising- most of bioinformatics is like that.

It is not clear how one should even go about it. So why pile up on the OP for "not trying" something.

I think the problem is solvable with some combination of a number of commands from the seqkit package

https://github.com/shenwei356/seqkit

but it may be that it requires writing a standalone script.

ADD REPLY
1
Entering edit mode

If OP tells us he tried Perl, or biopython, or excel, or R, those proficient in that language can try to help out. I'm not in favour of just presenting a solution. There is no obvious answer, and this indeed looks like a case for a script specific for this...

ADD REPLY
0
Entering edit mode

have you tried anything? there are a number of ways you can solve this.

ADD REPLY
0
Entering edit mode

I am new to this field. Could you suggest me any method

ADD REPLY
0
Entering edit mode

I would suggest a Biopython script.

ADD REPLY
0
Entering edit mode

I am new to this field. could you suggest me any method

ADD REPLY
0
Entering edit mode

could you provide me the script its really very helpful to me

ADD REPLY
2
Entering edit mode

You can not ask people to write scripts for you. You should try at least something. Why people are asking you "what have you tried" is because there are already 100s of posts here to solve your problem. So you need to search, and adapt those things.

ADD REPLY
0
Entering edit mode

Its exaggerating to say 100 posts, but the intention was that OP did not give an attempt to do it.

ADD REPLY
2
Entering edit mode
8.1 years ago

Create a hash (in perl sense) for each file using the IDs as keys and the sequences as values then replace the values from file 1 with those from file 2 having the same key e.g. something like this (in perl, untested):

use Bio::SeqIO;
my $seqio1 = Bio::SeqIO->new(-file => "file1.fa", '-format' => 'Fasta');
my %file1;
while(my $seq = $seqio1->next_seq) {
  my $string = $seq->seq;
  my $id = $seq->id;
  $file1{$id} = $seq;
}
my $seqio2 = Bio::SeqIO->new(-file => "file2.fa", '-format' => 'Fasta');
my %file2;
while(my $seq = $seqio2->next_seq) {
  my $string = $seq->seq;
  my $id = $seq->id;
  $file2{$id} = $seq;
}
foreach my $id(keys %file2) {
  $file1{$id} = $file2{$id}
}
ADD COMMENT
0
Entering edit mode

Thank you so much

ADD REPLY
1
Entering edit mode
8.1 years ago
22.venkat ▴ 10

Hi I used fallowing script and finished my task successfully.

#!/usr/bin/perl

use strict;
use warnings;

use Getopt::Long;

#set variables
my ($optHelp, $optFile1, $optFile2, $optList);

#get program options
GetOptions ('h' => \$optHelp, 'a=s' => \$optFile1, 'b=s' => \$optFile2, 'l=s' => \$optList);

# there are better ways to do this but I like this strategy :)
if($optHelp || !$optFile1 || !$optFile2  || !$optList ){
  print "Usege:\n\n";
  print "./program -a file1 -b file2 -l list<.tsv> \n\n";
  print "Note: sequences from file2 are replaced with those in file1 according to the list -l\n\n";
  exit(0);
}


#allocate memory (well, more of set variables)
my %hash1 =();
my %list = ();

#open files
open (ONE, "<", $optFile1) or die "$!";
open (TWO, "<", $optFile2) or die "$!";
open (LIST, "<", $optList) or die "$!";

#load list which should look like:
#file1_id <tab> file2_id

while(<LIST>){
  chomp;
  /^(.*?)\t(.*)/;
  $list{$1} = $2; #e
}
close LIST;

#read file one into memory
my $head;
while(<ONE>){
  chomp;
  if(/>(.*)/){$head = $1; next;} #e
  $hash1{$head} .= $_;
}
close ONE;

#replace fasta record from file to with the one in file one if id is set in list file
my $x = 0;
while(<TWO>){
  chomp;
  ## You can play with this if one line fasfa sequence is not what you can use
  if(/>(.*)/){if(defined $list{$1}){print $_. "\n" . $hash1{$list{$1}} . "\n"; $x = 1}else{$x = 0}}  #e
  print $_ . "\n" if $x == 0;  #e
}
close TWO;

but the intention was that OP did not give an attempt to do it. how you know Goutham Atla

ADD COMMENT
1
Entering edit mode

In general, people who post here for any programming help would post their code (which is not working as expected) along with some explanation (of the problem, errors in their code). There is no surprise Goutham (or any other, check comments of original post) has assumed that you haven't tried anything as you did not post any of your code in original thread. And you might be interested to check the following as a new user

How to Use Biostars, Part-I: Questions, Answers, Comments and Replies

ADD REPLY
0
Entering edit mode

Glad you solved the problem. As I mentioned, there are posts already with these type of questions. I could see this script coming from Fasta sequence replacement based on header name

ADD REPLY

Login before adding your answer.

Traffic: 4006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6