CAZy database has multiple family sequence...
0
0
Entering edit mode
23 months ago
GYUDAE • 0

Hello,

I'm graduate student who studying bioinformatics.

I have one question about using CAZy database which download from dbCAN website.

When I annotate my protein sequence with CAZy database, the annotation results show multiple protein family like 'GT4|GT97'. So I check the database downloaded from dbCAN website, and there are some amino acid sequences named multiple family like below

>AIZ26250.1|GT4|GT97
MAVIIFVNGIRAVNGLVKSSINTANAFAEEGLDVHLINFVGNITGAEHLYPPFHLHPNVKTSSIIDLFNDIPENVSCRNTPFYSIHQQFFKAEYSAHYKHVLMKIESLLSAEDSIIFTHPLQLEMYRLANNDIKSKAKLIVQIHGNYMEEIHNYEILARNIDYVDYLQTVSDEMLEEMHSHFKIKKDKLVFIPNITYPISLEKKEADFFIKDNEDIDNAQKFKRISIVGSIQPRKNQLDAIKIINKIKNENYILQIYGKSINKDYFELIKKYIKDNKLQNRILFKGESSEQEIYENTDILIMTSESEGFPYIFMEGMVYDIPIVVYDFKYGANDYSNYNENGCVFKTGDISGMAKKIIELLNNPEKYKELVQYNHNRFLKEYAKDVVMAKYFTILPRSFNNVSLSSAFSRKELDEFQNITFSIEDSNDLAHIWNFELTNPAQNMNFFALVGKRKFPMDAHIQGTQCTIKIAHKKTGNLLSLLLKKRNQLNLSRGYTLIAEDNSYEKYIGAISNKGNFEIIANKKSSLVTINKSTLELHEIPHELHQNKLLIALPNMQTPLKITDDNLIPIQASIKLEKIGNTYYPCFLPSGIFNNICLDYGEESKIINFSKYSYKYIYDSIRHIEQHTDISDIIVCNVYSWELIRASVIESLMEFTGKWEKHFQTSPKIDYRFDHEGKRSMDDVFSEETFIMEFPRKNGIDKKTAAFQNIPNSIVMEYPQTNGYSMRSHSLKSNVVAAKHFLEKLNKIKVDIKFKKHDLANIKKMNRIIYEHLGININIEAFLKPRLEKFKREEKYFHDFFKRNNFKEVIFPSTYWNPGIICAAHKQGIKVSDIQYAAITPYHPAYFKSPKSHYVADKLFLWSEYWNHELLPNPTREIGSGAAYWYALDDVRFSEKLNYDYIFLSQSRISSRLLSFAIEFALKNPQLQLLFSKHPDENIDLKNRIIPDNLIISTESSIQGINESRVAVGVYSTSLFEALACGKQTFVVKYPGYEIMSNEIDSGLFFAVETPEEMLEKTSPNWVAVADIENQFFGQEK

And also, in the database, there are only single protein family sequence like

>AGU84174.1|GT4
MRICLVLEGSYPYVHGGVSTWMHQYITEMKEHEFIIWVIGANEEKKGAFVYEFPENVVEVHEVFLDSLGSSKIIEKKSEELSREEYDALKQLVFCAKPDWSLIFDLLQEGKIQRDDFLVSEAFFQMIQDLCEEKYAAQPMSDVFHTIRSILFPLLMLLTSEIPIADAYHAICTGYGGILATLASYRMGKPLLLTEHGIYTREREEEILRADWILPSMRKQWIDFFYMLSDAIYSKADCITSLFSKARETQIEIGCEPNKCRVISNGIDYESFSKIPFEKDDDSWINIGAAVRMAPIKDIKTMIYAFYEVSAQIPNVRLYIMGGVDDKAYAEECYALARKLKLENLIFTGRVDIKEYLRKMDFMILTSISEGQPLSILESMAAGKPCVTTDVGCCKELLEGREDDELGVAGYCVPPTDLMSLAHAMIVMARSEEKRLKMGQIAKKRSEQFYQYHQMIEQYRQLYKEYVR

So, here is my question:

When I do further analysis using the annotation result with abundance table, do I have to merge the abundance? or just think those annotation result as individually?

enter image description here

Thank you for reading my question...

Gyudae LEE

metagenomics cazy annotation • 1.1k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Hi, Have you solved this problem? I met the same problem when I finished the blast and merge. While I wanted to calculate gene abundance during subsequent analysis, I wasn't sure how to classify or summarise a protein sequence when it belonged to multiple enzyme families. Could you help me with it, please?

Thanks so much.

ADD REPLY
1
Entering edit mode

One approach is to count abundance of domains. Suppose there are 10 CAZymes classified as GT4, and 10 classified as GT4+GT97. Then abundance of GT4 is 20, and abundance of GT97 is 10.

ADD REPLY
0
Entering edit mode

Thanks so much for your kind help and time! I will take a try.

ADD REPLY

Login before adding your answer.

Traffic: 2586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6