Hi everyone
i am trying to compare promoters sequence of two species using blast2seq , so i have a table generated from the genome genbank file using python and the table,which is a txt file, looks like something this , each GeneID has a corresponding XP value and other information.
LOC101251020 XP_004228330.1 GeneID:101251020 NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit 6
LOC101251313 XP_004228331.1 GeneID:101251313 F-box/kelch-repeat protein At1g55270-like
LOC101251313 XP_010314935.1 GeneID:101251313 F-box/kelch-repeat protein At1g55270-like
LOC101251313 XP_010315084.1 GeneID:101251313 F-box/kelch-repeat protein At1g55270-li
LOC101264245 XP_004228763.1 GeneID:101264245 NAC domain-containing protein 78-like
LOC101264547 XP_004228764.1 GeneID:101264547 uncharacterized protein LOC101264547
LOC104645410 XP_010315223.1 GeneID:104645410 probable E3 ubiquitin protein ligase DRIPH
what i am interested in there are the GeneID and XP_value, becuase the promoter is labelled by GeneID like this
Promoter___GeneID:101251020___43560:50059 TTATGATGGGTGACCCCCTCCGAAGTCCTTGTGTTGCATCCCTCCTTTTTTTCAAAATCGGTCGTGTAATTGAAAAATATTTTTATTTATTTATTTTTGCAGATACGACGTTC
also i have another text file table showing the match pairs of sequences , what i want is getting the corresponding promoters identification ID (GeneID) of these sequence through table one and the promoter sequence through the promoter files, so that i can use blast to compare their promoters, does anyone know how i can do this automatically for a lot of sequences? thank you very much
1 3517 S.lyco.fasta 1.000 XP_004228331.1 100%
1 3517 spen.fasta 1.000 XP_004228763.1 100%
2 3145 S.lyco.fasta 1.000 XP_004236763.1 100%
2 3145 spen.fasta 1.000 XP_004228763.1 100%
3 3078 S.lyco.fasta 1.000 XP_008522763.1 100%
3 3078 spen.fasta 1.000 XP_000753763.1 100%