Hi,
I am trying to obtain the cdna sequences for orthologs groups. There is a button for Download: Fasta but apparently it only works for the protein sequences. Does anybody know if there is a way to download them in bulk as opposed to download every single sequence individually?
There is not. I don't want to download a 2.4Gb compressed file and parse it every time I want the cDNA of a bunch of orthologs. There is an option to download all the ortholog protein sequences when you search a particular protein. I was looking for the same with cDNA.
currently there is no way to directly get the cds sequences from all the proteins in an OMA group or HOG from the web interface. However, there is a way how you can get them with quite little programmatic effort from the REST API. Here's a possible way how you could get them in python and output them as fasta:
import request, json
grp = 12345
group = json.loads(requests.get('https://omabrowser.org/api/group/{}/'.format(1345)).content.decode())
group_members_entries =[p['omaid']for p in group['members']]
reply = requests.post('https://omabrowser.org/api/protein/bulk_retrieve/', json={"ids":group_member_entries})
group_members = json.loads(reply).content.decode())for memb in group_members:
print(">{}\n{}\n\n".format(memb['omaid'], memb['cdna']))
We also see that this might be generally a useful feature and will therefore implement it for the next release.
have you looked here , there seems to be a download for all cDNA sequences
Yes, this is what I did, but it would be nice to have a bulk download of cDNA for selected sequences, just as there is for proteins
but there is, no?
which should be all the eukaryotic cDNAs
There is not. I don't want to download a 2.4Gb compressed file and parse it every time I want the cDNA of a bunch of orthologs. There is an option to download all the ortholog protein sequences when you search a particular protein. I was looking for the same with cDNA.
aha, ok, true, got your issue now.
yet, if the IDs are consistent I would download the cDNA file once , blast format it and repeatedly query that one for the CDSs I need