Entering edit mode
15 months ago
Riyad
•
0
Hi,
My blastp output is like the following 4 lines. I am looking for nuclear genes. Here, clear on mitochondrial, cytoplasmic, and chloroplastic genes hit, but there are bunch of hits those doesn't mention anything, for example the fourth one here. How can I choose which one is nuclear gene match in a typical blastp output?
hit_1 4.62e-155 66.358 RecName: Full=Malate dehydrogenase, **mitochondrial**; Flags: Precursor [Fragaria x ananassa]
hit_2 4.05e-88 49.485 RecName: Full=Protein RETICULATA-RELATED 1, **chloroplastic**; Flags: Precursor [Arabidopsis thaliana]
hit_3 7.66e-104 48.225 RecName: Full=Phenylalanine--tRNA ligase beta subunit, **cytoplasmic**; AltName: Full=Phenylalanyl-tRNA synthetase beta subunit; Short=PheRS [Arabidopsis thaliana]
hit_4 8.10e-108 80.556 RecName: Full=Peptidyl-prolyl cis-trans isomerase CYP22; Short=PPIase CYP22; AltName: Full=Cyclophilin of 22 kDa; AltName: Full=Cyclophilin-22 [Arabidopsis thaliana]
Just for clarification: are you looking for nuclear-encoded gene sequences (the DNA sequence of the gene is encoded in the nuclear vs. plastid genomes) or nuclear proteins (the protein product of the gene has sub-cellular localization in the nucleus)? These questions are very different. I think the localizations you have highlighted refer to the latter and the question of sub-cellular localization of products is more common and more interesting in my opinion.
Also, what is your Blast database? It looks like TrEMBL but you did not include any accessions in the output file. If you want to solve this task, be prepared to run the BLASTP analysis again with options to contain subject accessions to be able to look them up. If you need to look up the subcellular localization make sure to use only SwissProt/TrEMBL or a subset thereof. You can then lookup the subcellular localization of the proteins using UniProtKB: https://www.uniprot.org/uniprotkb/A0A0A0KDZ4/entry#subcellular_location
Thanks for the feedbacks, but I am not clear though.
Here is more details... Actually, I am doing a phylogeny project. I did RNA seqs from around 30 samples. Then assembled them by TRINITY. Then I converted them into amino acid seqs. So, now I have all of my samples with their amino acid sequences. I will use the amino acid sequences for phylogeny analysis. Importantly, I need only those sequences which are comes from nuclear (not from organelle) for my phylogeny works.
I am doing blastp with my amino acid seqs fasta file on swissport database. In output file, I can see specific localisation in many hits in case of organelle, even in cytoplasmic. I can't see any hits that could display "nuclear", etc. From this blast hit, how can I select my targeted sequences (just hits) as phylogeny works. That I mentioned earlier, I only need those sequences comes from nuclear, not from organelle.
Expecting a good solution...
I think you need to define your orthologue extraction strategy properly, pure BlastP is insufficient. Selecting nuclear-encoded genes alone will not do the trick either. The sequences need to be aligned with their respective orthologues to create a MSA suitable for phylogenetic analysis. To devise a good solution, we need to know what your samples are. If they are from closely related strains, for example, the strategy should be based on nucleotide sequences, not protein sequences.
I suggest you run 1:1 orthologue detection, e.g. OrthoFinder, first. Then filter the orthologous groups for nuclear genes if need be and use the remainder to create a concatenated MSA.
Also, what you are seeing as localization is the localization of the product which is not the same as the origin necessarily, e.g. many mitochondrial proteins are encoded in the nuclear genome.