Entering edit mode
10.0 years ago
Ole Kristian Tørresen
▴
160
Hi,
I'd like to remove all repeat-derived (like transposon proteins) from a UniProtKB/Swiss-Prot file (for instance ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz). Is there an easy way to do it? Is there, for instance, somewhere I can find all GO terms associated with transposon proteins?
Thank you.
Thank you Siva.
I'd expect that there's certain GO terms associated with transposons, but I have not been able to find if all transposon derived proteins in UniProtKB/Swiss-Prot are consistently annotated with the right GO term (nor which might be the correct GO term). I'm unsure of that post you are referring to would help. I could download RepBase and compare everything there to the UniProtKB/Swiss-Prot fasta file, but there should be an easier way to do it.
I agree that using already existing annotation is easier than searching RepBase or Pfam domains against UniProt. I am not familiar with the GO annotation in UniProt. Can you try the UniProt keywords?
Searching the UniProt data with keyword "transposable element"
There also seems to be an entry called "TRANSPOSON" in the optional Reference Comment (RC) line in the sequence entry.
I am not sure if both these options are the same and if they can find all the proteins encoded in the transposons.
That keyword "transposable element" is a great suggestion. That's the most comprehensive way to attacking this problem I've come across.
Thank you.