Hi everyone,
I am stuck with the following problem:
From a 'BLAST result' table in a database I want to select only those entries which fulfill some specific requirements (e.g. contain a keyword) and are from species that are 'children' of a given taxon 'XY' with the id 123.
E.g., I want all entries which belong to species which themselves belong to the superkingdom Bacteria. The query that takes into account the specific keyword, etc. works fine.
However, I find it very hard to get a good idea how to write the query so that it only takes the 'children' of a taxon.
The table 'taxonomy' was created from the NCBI taxonomy names.dmp and nodes.dmp files. The taxon_id is the NCBI taxonomy ID, while the other IDs are from the database only. It looks like this:
id | taxon_id | parent_taxon_id | taxon_name | rank
---------+----------+-----------------+------------+--------------
3963792 | 2 | 131567 | Bacteria | superkingdom
The 'BLAST result' table contains the foreign key 'taxonomy_id' of the taxonomy table. It looks like this (I removed some columns with more information on the actual BLAST hit):
id | score | tax_id | blast_spec | taxonomy_id
--------+-------+--------+--------------------+-------------
850859 | 722 | 9483 | Callithrix jacchus | 3981316
850860 | 726 | 10141 | Cavia porcellus | 3982450
My idea would be, to make a FOR loop and iterate over all results that I get and check, whether the 'taxonomy_id' in there has a parent_taxon_id that has the 123 as its parent_taxon_id.
However, I have absolutely no clue how to design such a loop in SQL and how to make sure it does not run forever. I have not yet understood the concept of joins, but these probably would also help in this particular problem?
This is probably a problem that a person with more experience in SQL than me can easily handle..?
I am very thankful for all your advice!
Can you add the table 'BLAST result', and what the foreign keys are?
For 'BLAST result' it says: Foreign-key constraints: "BLASTresult_taxonomy_id_fkey" FOREIGN KEY (taxonomy_id) REFERENCES taxonomy(id)
Why does the blast result table have both a tax_id and a taxonomy_id column?
Ah, taxonomy_id refers to taxonomy(id) while tax_id refers to taxonomy(taxon_id). Having redundancy like this is not the best idea.