If you have a genome of the organism, you can take the gene behind your BLAST result and see if that gene exists in the genome of your organism (using BLAST). Depending on how related your contaminating species is to your target species, you may not expect reads from the contaminating species to map to your target genome. So, if you map reads to the contaminating gene, you could take those reads and see if they map to your genome. If you expect a contaminating species you could assess the mapping rate of reads to that species vs your target species. With a combination of BLAST and your aligner, you should be able to determine what's going on.
Thank you for replying.
I do not have the genome of the organism. The organism was subjected to RNAseq and de novo assembly for the first time.
Because the assembly was done with the data that may have been contaminated, a gene from a different species was also identified.
However, there is a possibility that the organisms has it.
I think it depends on: 1) how similar is the Blast hit you find, and 2) how closely related is the other organism. For example, if your organism of interest is a plant and you find a hit with 100% identity to a human gene - this is probably a contamination. On the other hand, if you find a hit to a species from the same genus with 90% identity, then it could actually be a real gene. You'll probably have to determine some cutoffs and go with them when filtering for contamination. I don't think there is a way to be 100% sure when the external information is limited.
Thank you for replying. I do not have the genome of the organism. The organism was subjected to RNAseq and de novo assembly for the first time. Because the assembly was done with the data that may have been contaminated, a gene from a different species was also identified. However, there is a possibility that the organisms has it.
I think it depends on: 1) how similar is the Blast hit you find, and 2) how closely related is the other organism. For example, if your organism of interest is a plant and you find a hit with 100% identity to a human gene - this is probably a contamination. On the other hand, if you find a hit to a species from the same genus with 90% identity, then it could actually be a real gene. You'll probably have to determine some cutoffs and go with them when filtering for contamination. I don't think there is a way to be 100% sure when the external information is limited.