Hello! I'm trying to do a analysis of long non-coding RNAs, and so I need to make sure they are non-coding. Among other methods, I tried running my sequences against Pfam using Hmmscan using E as 0.001 and domE 0.001. The results are what's confusing me: every single hit says the sequence hits with a collagen domain. I even blasted one of them and it had nothing to do with collagen, though it was indeed coding. I've already tested it with two different datasets albeit from the same species. Does anyone know what's happening?
Pfam database is for coding proteins/protein families. Not clear what you are doing above.
My sequences are potential long non-coding and literature says to use protein databases to confirm that they do not match with any protein families. What did seem strange was that every single protein coding transcript matched with collagen.