I understand the snarkiness when a question is trivial - but here it is misguided - this is not nearly as easy to solve as you imply. I don't think that there are 100 posts like this. The OP has a specific and peculiar problem - but then that is not surprising- most of bioinformatics is like that.
It is not clear how one should even go about it. So why pile up on the OP for "not trying" something.
I think the problem is solvable with some combination of a number of commands from the seqkit package
If OP tells us he tried Perl, or biopython, or excel, or R, those proficient in that language can try to help out. I'm not in favour of just presenting a solution. There is no obvious answer, and this indeed looks like a case for a script specific for this...
You can not ask people to write scripts for you. You should try at least something. Why people are asking you "what have you tried" is because there are already 100s of posts here to solve your problem. So you need to search, and adapt those things.
Create a hash (in perl sense) for each file using the IDs as keys and the sequences as values then replace the values from file 1 with those from file 2 having the same key e.g. something like this (in perl, untested):
use Bio::SeqIO;
my $seqio1 = Bio::SeqIO->new(-file => "file1.fa", '-format' => 'Fasta');
my %file1;
while(my $seq = $seqio1->next_seq) {
my $string = $seq->seq;
my $id = $seq->id;
$file1{$id} = $seq;
}
my $seqio2 = Bio::SeqIO->new(-file => "file2.fa", '-format' => 'Fasta');
my %file2;
while(my $seq = $seqio2->next_seq) {
my $string = $seq->seq;
my $id = $seq->id;
$file2{$id} = $seq;
}
foreach my $id(keys %file2) {
$file1{$id} = $file2{$id}
}
Hi I used fallowing script and finished my task successfully.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
#set variables
my ($optHelp, $optFile1, $optFile2, $optList);
#get program options
GetOptions ('h' => \$optHelp, 'a=s' => \$optFile1, 'b=s' => \$optFile2, 'l=s' => \$optList);
# there are better ways to do this but I like this strategy :)
if($optHelp || !$optFile1 || !$optFile2 || !$optList ){
print "Usege:\n\n";
print "./program -a file1 -b file2 -l list<.tsv> \n\n";
print "Note: sequences from file2 are replaced with those in file1 according to the list -l\n\n";
exit(0);
}
#allocate memory (well, more of set variables)
my %hash1 =();
my %list = ();
#open files
open (ONE, "<", $optFile1) or die "$!";
open (TWO, "<", $optFile2) or die "$!";
open (LIST, "<", $optList) or die "$!";
#load list which should look like:
#file1_id <tab> file2_id
while(<LIST>){
chomp;
/^(.*?)\t(.*)/;
$list{$1} = $2; #e
}
close LIST;
#read file one into memory
my $head;
while(<ONE>){
chomp;
if(/>(.*)/){$head = $1; next;} #e
$hash1{$head} .= $_;
}
close ONE;
#replace fasta record from file to with the one in file one if id is set in list file
my $x = 0;
while(<TWO>){
chomp;
## You can play with this if one line fasfa sequence is not what you can use
if(/>(.*)/){if(defined $list{$1}){print $_. "\n" . $hash1{$list{$1}} . "\n"; $x = 1}else{$x = 0}} #e
print $_ . "\n" if $x == 0; #e
}
close TWO;
but the intention was that OP did not give an attempt to do it. how you know Goutham Atla
In general, people who post here for any programming help would post their code (which is not working as expected) along with some explanation (of the problem, errors in their code). There is no surprise Goutham (or any other, check comments of original post) has assumed that you haven't tried anything as you did not post any of your code in original thread. And you might be interested to check the following as a new user
What have you tried?
I understand the snarkiness when a question is trivial - but here it is misguided - this is not nearly as easy to solve as you imply. I don't think that there are 100 posts like this. The OP has a specific and peculiar problem - but then that is not surprising- most of bioinformatics is like that.
It is not clear how one should even go about it. So why pile up on the OP for "not trying" something.
I think the problem is solvable with some combination of a number of commands from the
seqkit
packagehttps://github.com/shenwei356/seqkit
but it may be that it requires writing a standalone script.
If OP tells us he tried Perl, or biopython, or excel, or R, those proficient in that language can try to help out. I'm not in favour of just presenting a solution. There is no obvious answer, and this indeed looks like a case for a script specific for this...
have you tried anything? there are a number of ways you can solve this.
I am new to this field. Could you suggest me any method
I would suggest a Biopython script.
I am new to this field. could you suggest me any method
could you provide me the script its really very helpful to me
You can not ask people to write scripts for you. You should try at least something. Why people are asking you "what have you tried" is because there are already 100s of posts here to solve your problem. So you need to search, and adapt those things.
Its exaggerating to say 100 posts, but the intention was that OP did not give an attempt to do it.