Hi all, I actually have two fasta file candidates_aa_0042.fasta and candidates_aa_0035.fasta
and two dataframe Best_blast_candidate_hit_0042.csv and Best_blast_candidate_hit_0035.csv
Here is the exemple containt of them :
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore salltitles staxids scientific_name scomnames sskingdoms Order
g44459.t1_0035_0035 XP_011687429.1 39.5 157 95 0 7 163 2 158 8.1e-27 129.8 uncharacterized protein LOC105449744 [Wasmannia auropunctata] 64793 Wasmannia auropunctata Eukaryota Hymenoptera
g17612.t1_0035_0042 XP_011699787.1 59.3 349 142 0 99 447 336 684 1.5e-120 442.6 uncharacterized protein LOC105457055 [Wasmannia auropunctata] 64793 Wasmannia auropunctata Eukaryota Hymenoptera
g29924.t1_0035_0042 XP_011871948.1 67.0 261 85 1 1 260 18 278 1.3e-100 375.6 uncharacterized protein LOC105564266, partial [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g47960.t1_0035_0035 XP_011860868.1 68.8 298 93 0 1 298 142 439 3.3e-116 427.6 uncharacterized protein LOC105558006 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g28580.t1_0035_0042 XP_011883624.1 70.0 240 69 3 1 239 41 278 1.3e-86 328.9 uncharacterized protein LOC105570787 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
and
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore salltitles staxids scientific_name scomnames sskingdoms Order
g34354.t1_0042_0035 XP_011699801.1 43.7 135 63 4 7 128 625 759 9.3e-17 96.3 LOW QUALITY PROTEIN 64793 Wasmannia auropunctata Eukaryota Hymenoptera
g34606.t1_0042_0035 XP_011871948.1 59.8 249 79 2 1 228 51 299 3.4e-81 310.8 uncharacterized protein LOC105564266, partial [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g13215.t1_0042_0042 XP_011883625.1 62.0 242 92 0 46 287 160 401 5.4e-82 313.9 uncharacterized protein LOC105570788, partial [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g35379.t1_0042_0035 XP_011858260.1 73.3 191 51 0 4 194 690 880 6.3e-76 293.1 uncharacterized protein LOC105555830 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g13770.t1_0042_0042 XP_011883624.1 66.5 203 65 3 10 211 33 233 1.9e-65 258.5 uncharacterized protein LOC105570787 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
And I actually have to merge them BUT in the same order than the seqID in the fasta file.
For exemple if the fasta file 1 contains :
>seq1_0035_0042
ATGGAGAGATAG
>seq6_0035_0035
ATGGATAGAGA
and the fasta file 2 contains:
>seq8_0042_0042
ATGGAGAGATAG
>seq3_0042_0035
ATGGATAGAGA
then I would like to merge my dataframe in that order:
ex:
qseqid_1 qseqid_2 sseqid_1 sseqid_2 pident_1 pident_2 etc...
seq1_0035_0042 XP_011883678.1 seq8_0042_0042 XP_011883789.1 78.9 45.9 etc
seq6_0035_0035 XP_011566754.1 seq3_0042_0035 XP_011566754.1 67.9 78.0. etc
Ps: all SeqId in the fasta files are not present in the dataframe, so if there is not a pair, maybe could we add it at the dataframe and add a Nan at the column_2 parts? Thank for your help :)