Question

Replace the sequence

0

Entering edit mode

8.1 years ago

22.venkat ▴ 10

Hi, I have two multiple sequence fasta files

file-1 I have 4000 fasta sequences with different IDs

file-2 I have 100 fasta sequences which IDs are already existed in file-1 but the sequences were different.

now i want to replace the sequence of file-1 with sequence of file-2 based on IDs can any one help regarding this

next-gen • 2.9k views

ADD COMMENT • link 8.1 years ago by 22.venkat ▴ 10

4

Entering edit mode

What have you tried?

ADD REPLY • link 8.1 years ago by WouterDeCoster 47k

0

Entering edit mode

I understand the snarkiness when a question is trivial - but here it is misguided - this is not nearly as easy to solve as you imply. I don't think that there are 100 posts like this. The OP has a specific and peculiar problem - but then that is not surprising- most of bioinformatics is like that.

It is not clear how one should even go about it. So why pile up on the OP for "not trying" something.

I think the problem is solvable with some combination of a number of commands from the seqkit package

https://github.com/shenwei356/seqkit

but it may be that it requires writing a standalone script.

ADD REPLY • link 8.1 years ago by Istvan Albert 102k

1

Entering edit mode

If OP tells us he tried Perl, or biopython, or excel, or R, those proficient in that language can try to help out. I'm not in favour of just presenting a solution. There is no obvious answer, and this indeed looks like a case for a script specific for this...

ADD REPLY • link 8.1 years ago by WouterDeCoster 47k

0

Entering edit mode

have you tried anything? there are a number of ways you can solve this.

ADD REPLY • link 8.1 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

I am new to this field. Could you suggest me any method

ADD REPLY • link 8.1 years ago by 22.venkat ▴ 10

0

Entering edit mode

I would suggest a Biopython script.

ADD REPLY • link 8.1 years ago by WouterDeCoster 47k

0

Entering edit mode

I am new to this field. could you suggest me any method

ADD REPLY • link 8.1 years ago by 22.venkat ▴ 10

0

Entering edit mode

could you provide me the script its really very helpful to me

ADD REPLY • link 8.1 years ago by 22.venkat ▴ 10

2

Entering edit mode

You can not ask people to write scripts for you. You should try at least something. Why people are asking you "what have you tried" is because there are already 100s of posts here to solve your problem. So you need to search, and adapt those things.

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

0

Entering edit mode

Its exaggerating to say 100 posts, but the intention was that OP did not give an attempt to do it.

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

score 2 · Answer 1 · 2016-12-02

Create a hash (in perl sense) for each file using the IDs as keys and the sequences as values then replace the values from file 1 with those from file 2 having the same key e.g. something like this (in perl, untested):

use Bio::SeqIO;
my $seqio1 = Bio::SeqIO->new(-file => "file1.fa", '-format' => 'Fasta');
my %file1;
while(my $seq = $seqio1->next_seq) {
  my $string = $seq->seq;
  my $id = $seq->id;
  $file1{$id} = $seq;
}
my $seqio2 = Bio::SeqIO->new(-file => "file2.fa", '-format' => 'Fasta');
my %file2;
while(my $seq = $seqio2->next_seq) {
  my $string = $seq->seq;
  my $id = $seq->id;
  $file2{$id} = $seq;
}
foreach my $id(keys %file2) {
  $file1{$id} = $file2{$id}
}

WouterDeCoster · Answer 2 · 2016-12-05

Hi I used fallowing script and finished my task successfully.

#!/usr/bin/perl

use strict;
use warnings;

use Getopt::Long;

#set variables
my ($optHelp, $optFile1, $optFile2, $optList);

#get program options
GetOptions ('h' => \$optHelp, 'a=s' => \$optFile1, 'b=s' => \$optFile2, 'l=s' => \$optList);

# there are better ways to do this but I like this strategy :)
if($optHelp || !$optFile1 || !$optFile2  || !$optList ){
  print "Usege:\n\n";
  print "./program -a file1 -b file2 -l list<.tsv> \n\n";
  print "Note: sequences from file2 are replaced with those in file1 according to the list -l\n\n";
  exit(0);
}


#allocate memory (well, more of set variables)
my %hash1 =();
my %list = ();

#open files
open (ONE, "<", $optFile1) or die "$!";
open (TWO, "<", $optFile2) or die "$!";
open (LIST, "<", $optList) or die "$!";

#load list which should look like:
#file1_id <tab> file2_id

while(<LIST>){
  chomp;
  /^(.*?)\t(.*)/;
  $list{$1} = $2; #e
}
close LIST;

#read file one into memory
my $head;
while(<ONE>){
  chomp;
  if(/>(.*)/){$head = $1; next;} #e
  $hash1{$head} .= $_;
}
close ONE;

#replace fasta record from file to with the one in file one if id is set in list file
my $x = 0;
while(<TWO>){
  chomp;
  ## You can play with this if one line fasfa sequence is not what you can use
  if(/>(.*)/){if(defined $list{$1}){print $_. "\n" . $hash1{$list{$1}} . "\n"; $x = 1}else{$x = 0}}  #e
  print $_ . "\n" if $x == 0;  #e
}
close TWO;

but the intention was that OP did not give an attempt to do it. how you know Goutham Atla