Hello, Biostars community I would like to receive your support with the next question.
I have developed this simple script to concatenate fasta sequences from different files in one single resulting file according to the same ID of each sequence respectively and It works, nevertheless, I would like to know if there is a way to control the order of the concatenation according to the name of the file.
For example
fie1_OTIB.fasta
>seq1
aaa
>seq2
ccc
file2_OTIA.fasta
>seq1
ttt
>seq2
ggg
file3_OTSS.fasta
>seq1
ggg
>seq2
ttt
file4_OTLS
>seq1
ccc
>seq2
aaa
For now, the script is going to give me this result in one fasta file
>seq1
TTTAAAGGGCCC
Order for the seq1 >> file2 file1 file3 file4
>seq2
GGGCCCTTTAAA
Order for the seq2 >> file2 file1 file3 file4
Here I would like to know if I could modify the script to order the concatenation according to my preference file due that, for now, it is taking a predetermined order, I would like in one case for example
>seq1
AAATTTGGGCCC
ORDER = file1 file2 file3 file4
>seq2
AAATTTGGGCCC
ORDER = file1 file2 file3 file4
Thanks in advance for your valuable support,
use Bio::SeqIO;
$input = "FOLDER_INPUT_FASTA_DIRECTORY";
my %seqs;
opendir DH, $input or die "Cannot open directory: $!";
open (outfile, ">","OUTPUT_DIRECTOTY");
while (my $file = readdir(DH)) {
next if $file =~ /^\./;
$new_file= $input."/".$file;
open GB, "$new_file" or die $!;
my $seqio = Bio::SeqIO->new(-file => $new_file, -format => "fasta");
while(my $seq = $seqio->next_seq) {
my $seq_seq = $seq->seq();
my $seq_id = $seq->id();
$seqobj = Bio::Seq->new( -display_id => $seq_id, -seq => $seq_seq);
$seq_id = $seqobj->id();
$seqstr = $seqobj->seq();
$seqs{$seqobj->display_id} .= $seqobj->seq;
}
}
foreach my $key (keys %seqs) {
print outfile ">$key\n$seqs{$key}\n";
}
do all the files have same number of sequences? Damian_Civ. Try
seqkit concat
. It concatenates as per ID and file name order.In your code,
opendir
will get the directory content in sorted order.As you mentioned if the names of the files are exactly the same as
fie1_OTIB.fasta, file2_OTIA.fasta, file3_OTSS.fasta, file4_OTLS
then you will get output as your expecting but if the file names are withoutfile(number)_
prefix thenopendir
will get the directory content in a different order likeOTIA.fasta, OTIB.fasta, OTLS.fasta, OTSS.fasta
and your output will befile2 file1 file4 file3
.If this is the problem then renaming the files in lexicographical order would be a dummy solution.
Thanks.