Hello to everybody.
As I start studying bioinformatics recently, I need from an expert to validate if my "pipeline" is correct.
Let's say we have two genomes, A &B and from each genome, a group of encoded genes, Ax=20 & Bx 15, respectively.
To find the non shared genes between those two genomes:
I. created a DB of the genes of the genome A (Ax database), and I blast the Bx against Ax. I used this command to count the numbers of shared genes.
for the genome A
awn '{print $1}' blast_result.txt |sort -u|wc -l --> 10 genes
and for the genome B
awk '{print $2}' blast_blast.txt |sort -u|wc -l -->8 genes
After I did tblastn of Ax genes against B genome in which I have --> 1 hit
and Bx genes against A genome which results in ---> 2 hits
So the actual number of NON shared genes between genomes A&B is :
For genome A 20 -(10 +1) =9 non-shared genes with genome B
For genome B 15 - (8 +2) = 5 non shared genes with genome A
Is this pipeline correct? Thanks in advance!!!
That looks reasonably. You're essentially describing
reciprocal best BLAST
hits. You might want to read up on this as a technique and see if you can improve on your approach at all.thank a lot for the responce end for the protocol :-)!!!
How far apart are these genomes in evolutionary terms? If they are very similar/close using CD-HIT to identify a redundant set of genes should leave differences behind. Especially if you have protein sequences.
The other question here is why exactly create this pipeline? if you're looking for accessory genes, lots of pangenome tools already exist that can give you this information.
I will check to see if I will have the same numbers as my approach!! No, I am not looking for something special. I'm just trying to understand the "bioinformatician point of thinking ". Combining a few command prompt, few Perl scripts (even some basics scripts), viewing some software, etc. It's a newly chaotic world for me :P. Thank you so much for your suggestions.
Cool. I will check with your suggested tool :D. Right now I'm testing some knowledge that I learned. I'm doing random blasts just to understand who the materials work. The theory is good but it is completely different when you applied what your learnt!!! Thanks a lot :D