I have been working on analyzing microsyntenic regions between different species using the OMA Python API (https://github.com/DessimozLab/pyomadb). Now I would like to download the fasta sequences of these regions with a script, but it seems that the chromosome formats vary across species, making the extraction process more complex.
For example, when working with species like Bos taurus, I can find and fetch chromosome 13 from the refseq without any issues. However, for other species, such as Ailuropoda melanoleuca, the chromosome is represented as an "unplaced genomic scaffold" with the accession number GL192479.1, and the previous approach doesn't work.
I am relatively new to working with this type of data, so there's a possibility that I might overlook something. If you have any other suggestions or programs to accomplish this task, I would greatly appreciate your input
Thanks!
The thing is that I have cases where the value in the chromosome columns is just "15", not the accession number, and sometimes there is an accession with out the word scaffold. So I guess I will need to deal with this with an if statement with regex to discriminate each case an treat them differently.
Thanks!