Extract protein sequence
1
0
Entering edit mode
7 months ago
anna ▴ 20

I have a large fasta file of new species, I want to find extract a particular protein sequence. I also know a protein sequence of a similar species, which potentially can be used for finding the protein sequence in my data. How could I do it? Thank you very much!

fasta alignment blast • 645 views
ADD COMMENT
3
Entering edit mode
7 months ago

You didn't mention what kind of organism you are working with and what kind of protein you are looking for?

If it happens to be a microbial genome and the protein falls within the categories of antimicrobial peptides, antibiotic resistance genes or biosynthetic gene clusters, you could try running funcscan or also genomeannotator (no stable release yet, though) on the data? Also Bactopia or and Anvi'o feature appropriate tools to proceed with annotation. Finding your protein of interest in the annotated assembly is probably more straightforward.

If you wish to start from the known protein sequence of the similar species, I would subject that first to a search for protein domains. Domains are usually better conserved than the remainder of the protein. Take the corresponding nucleotide sequence of the domains from the reference of that species and try seqkit fish or seqkit locate on your scaffolds. Maybe you are lucky and can locate a site with the similar domains next to each other in your assembly?

Good luck!

ADD COMMENT
0
Entering edit mode

Thank you! I need this for eukaryotic organism and known protein sequences, but I guess seqkit would work well!

ADD REPLY

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6