Question

Pfam only identifies collagen

0

Entering edit mode

13 months ago

Eduarda • 0

Hello! I'm trying to do a analysis of long non-coding RNAs, and so I need to make sure they are non-coding. Among other methods, I tried running my sequences against Pfam using Hmmscan using E as 0.001 and domE 0.001. The results are what's confusing me: every single hit says the sequence hits with a collagen domain. I even blasted one of them and it had nothing to do with collagen, though it was indeed coding. I've already tested it with two different datasets albeit from the same species. Does anyone know what's happening?

RNA-seq lncRNA • 833 views

ADD COMMENT • link updated 13 months ago by Ram 45k • written 13 months ago by Eduarda • 0

0

Entering edit mode

I'm trying to do a analysis of long non-coding RNAs

Pfam database is for coding proteins/protein families. Not clear what you are doing above.

ADD REPLY • link 13 months ago by GenoMax 154k

0

Entering edit mode

My sequences are potential long non-coding and literature says to use protein databases to confirm that they do not match with any protein families. What did seem strange was that every single protein coding transcript matched with collagen.

ADD REPLY • link 13 months ago by Eduarda • 0

score 1 · Answer 1 · 2024-08-26

My recollection is that collagen sequences have many repetitive residues, and those are Ds and Es if I remember correctly. If your putative ncRNAs end up with stronger repetitive bias after translation, you may get non-specific collagen matches. Many "regular" proteins end up matching collagen for the same reason even though they are not related to it.