Average Amino Acid Identity (AAI) analysis manually
1
0
Entering edit mode
3.2 years ago
fec2 ▴ 50

Hi all,

I need to perform Average Amino Acid Identity (AAI) analysis for 422 genome using the SLURM system that only allows jobs to run for 3 days. Tool like compareM can't finish the job on time. Therefore I wish to run the analysis using parallel, awk or sed command.

However, I don't really understand how this analysis is working, basically they perform BLAST from the query genome against the reference genome with cut-offs of at least 30% identity and at least 70% coverage. Then they took the top match and performed the reverse search using BLAST with the same cut-offs.

I was previously running an similar analysis called percentage of conserved protein using script like below:

cat allpairs.txt | parallel --colsep ' ' -j 32 \ blastp -query {1} -subject {2} -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {1}_{2}.tsv

which I first save a file contains all the pairs of genome I want to BLAST (allpairs.txt) and perform BLAST using parallel command.

But I don't understand how to perform the reverse search using BLAST with the same cut-offs, is it possible to do it using parallel, awk or sed?

Thank you very much.

Best regards,

Felix

awk parallel sed AAI BLAST • 1.7k views
ADD COMMENT
2
Entering edit mode
3.2 years ago
Mensur Dlakic ★ 28k

You may want to give a try to recently developed programs that can do this in couple of hours on a simple computer.

For nucleotides:

ADD COMMENT
1
Entering edit mode

Thanks for your suggestion, I have tried few tools, they can't finish 422 genome in 3 day, even I change from BLAST to Prodigal. For online tool, all of them has limitation for number of genome. AAI is needed but not other analysis because this is taxonomy study in genus level. But thanks anyway.

ADD REPLY
1
Entering edit mode

With all due respect, it is fairly trivial to compare 422 genomes in much less than 3 days, but not necessarily by using BLAST. It doesn't seem like you looked at the links I provided earlier so these also may be for nothing, but here it is just in case:

ADD REPLY
0
Entering edit mode

In fact, I have tried all the tools for AAI before, I swear. For other comparison tool, the reason I need AAI is that this analysis is sort of standard for bacteria genus delineation. So, reviewer want this analysis. So, for the analysis I mentioned above, we removed the pair in the text file (allpairs.txt) that has already done the BLAST analysis and continue for another 3 day. The reason I posted here is that I dont understand how to do reverse search using for the top match BLAST with the same cut-offs. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6