Question

CAZy database has multiple family sequence...

0

Entering edit mode

2.7 years ago

GYUDAE • 0

Hello,

I'm graduate student who studying bioinformatics.

I have one question about using CAZy database which download from dbCAN website.

When I annotate my protein sequence with CAZy database, the annotation results show multiple protein family like 'GT4|GT97'. So I check the database downloaded from dbCAN website, and there are some amino acid sequences named multiple family like below

>AIZ26250.1|GT4|GT97
MAVIIFVNGIRAVNGLVKSSINTANAFAEEGLDVHLINFVGNITGAEHLYPPFHLHPNVKTSSIIDLFNDIPENVSCRNTPFYSIHQQFFKAEYSAHYKHVLMKIESLLSAEDSIIFTHPLQLEMYRLANNDIKSKAKLIVQIHGNYMEEIHNYEILARNIDYVDYLQTVSDEMLEEMHSHFKIKKDKLVFIPNITYPISLEKKEADFFIKDNEDIDNAQKFKRISIVGSIQPRKNQLDAIKIINKIKNENYILQIYGKSINKDYFELIKKYIKDNKLQNRILFKGESSEQEIYENTDILIMTSESEGFPYIFMEGMVYDIPIVVYDFKYGANDYSNYNENGCVFKTGDISGMAKKIIELLNNPEKYKELVQYNHNRFLKEYAKDVVMAKYFTILPRSFNNVSLSSAFSRKELDEFQNITFSIEDSNDLAHIWNFELTNPAQNMNFFALVGKRKFPMDAHIQGTQCTIKIAHKKTGNLLSLLLKKRNQLNLSRGYTLIAEDNSYEKYIGAISNKGNFEIIANKKSSLVTINKSTLELHEIPHELHQNKLLIALPNMQTPLKITDDNLIPIQASIKLEKIGNTYYPCFLPSGIFNNICLDYGEESKIINFSKYSYKYIYDSIRHIEQHTDISDIIVCNVYSWELIRASVIESLMEFTGKWEKHFQTSPKIDYRFDHEGKRSMDDVFSEETFIMEFPRKNGIDKKTAAFQNIPNSIVMEYPQTNGYSMRSHSLKSNVVAAKHFLEKLNKIKVDIKFKKHDLANIKKMNRIIYEHLGININIEAFLKPRLEKFKREEKYFHDFFKRNNFKEVIFPSTYWNPGIICAAHKQGIKVSDIQYAAITPYHPAYFKSPKSHYVADKLFLWSEYWNHELLPNPTREIGSGAAYWYALDDVRFSEKLNYDYIFLSQSRISSRLLSFAIEFALKNPQLQLLFSKHPDENIDLKNRIIPDNLIISTESSIQGINESRVAVGVYSTSLFEALACGKQTFVVKYPGYEIMSNEIDSGLFFAVETPEEMLEKTSPNWVAVADIENQFFGQEK

And also, in the database, there are only single protein family sequence like

>AGU84174.1|GT4
MRICLVLEGSYPYVHGGVSTWMHQYITEMKEHEFIIWVIGANEEKKGAFVYEFPENVVEVHEVFLDSLGSSKIIEKKSEELSREEYDALKQLVFCAKPDWSLIFDLLQEGKIQRDDFLVSEAFFQMIQDLCEEKYAAQPMSDVFHTIRSILFPLLMLLTSEIPIADAYHAICTGYGGILATLASYRMGKPLLLTEHGIYTREREEEILRADWILPSMRKQWIDFFYMLSDAIYSKADCITSLFSKARETQIEIGCEPNKCRVISNGIDYESFSKIPFEKDDDSWINIGAAVRMAPIKDIKTMIYAFYEVSAQIPNVRLYIMGGVDDKAYAEECYALARKLKLENLIFTGRVDIKEYLRKMDFMILTSISEGQPLSILESMAAGKPCVTTDVGCCKELLEGREDDELGVAGYCVPPTDLMSLAHAMIVMARSEEKRLKMGQIAKKRSEQFYQYHQMIEQYRQLYKEYVR

So, here is my question:

When I do further analysis using the annotation result with abundance table, do I have to merge the abundance? or just think those annotation result as individually?

enter image description here

Thank you for reading my question...

Gyudae LEE

metagenomics cazy annotation • 1.4k views

ADD COMMENT • link updated 23 months ago by JyiYeung • 0 • written 2.7 years ago by GYUDAE • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY • link 2.7 years ago by Ram 45k

0

Entering edit mode

Hi, Have you solved this problem? I met the same problem when I finished the blast and merge. While I wanted to calculate gene abundance during subsequent analysis, I wasn't sure how to classify or summarise a protein sequence when it belonged to multiple enzyme families. Could you help me with it, please?

Thanks so much.

ADD REPLY • link 23 months ago by JyiYeung • 0

1

Entering edit mode

One approach is to count abundance of domains. Suppose there are 10 CAZymes classified as GT4, and 10 classified as GT4+GT97. Then abundance of GT4 is 20, and abundance of GT97 is 10.