Entering edit mode
3.3 years ago
Nelo
▴
20
Hi
Is there any way to extract multiple protein sequences given in the published paper using either its PMID, DOI or Supplementary files.
Thanks
It's unlikely you will be able to go directly from a paper DOI to a genetic sequence. If the paper lists the databases they uploaded the data to, with accession numbers etc, then it might be possible, but we'd need more information about what the paper says exactly.
Yes some paper mentioned about the accession number but other paper haven't mentioned accession number of protein other than the number of protein they got while doing genome-wide studies of specific plant species. That's why I am looking for some program using the title,PMID or DOI to download.
Caveat: This is likely not going to work for most papers. But if you have the right PMID then you could do the following.
First of of thank you so much for replying again
So the number '22753475' is the PMID I guess but what about the last line 'grep ">" | head -10' for? Are we limiting the number of result we want, because you got exactly the 10 result here
And it's been 10 mins now I executed this command and still its under process
22753475
is the PMID. I added the part starting withgrep
onwards to demonstrate that this works. You will need to take that part out to save the sequence. Simply redirect to a fileesearch .. blah > seq.fa
.