Entering edit mode
4.6 years ago
zarodkip
•
0
Hello all,
I am trying to establish a core proteome of A. baumannii, ie the proteins all of the strains have in common. I have multiple .fasta proteome files.
What would be the most appropriate way of going about this? would this do the trick?:
blastp -query query.fasta -db db -out output.txt -outfmt "6 qseqid qlen sseqid salltitles pident mismatch gapopen qstart qend qcovs sstart send evalue bitscore" -evalue 0.00001 -max_target_seqs 5 -num_threads 4
Also, is there a way to run one vs all of my proteomes blast and not one vs one proteome?
I should add I am super new to local blast and using any kind of coding.
What format is your data in? Multi-fasta protein sequence files one per strain? If these are very similar strains (and your dataset is reasonably complete in each case) then you may be able to use CD-HIT to come up with a non-redundant set of proteins which would be equivalent to core proteome.
Yes, they are multi-fasta sequence files one per strain. I'll look into CD-HIT. Thank you.
I recommend hmmscan (or was it the other hmmsomething) from hmmer against pfam. You get far easier results to interpret this way..