Pfam only identifies collagen
1
0
Entering edit mode
19 days ago
Eduarda • 0

Hello! I'm trying to do a analysis of long non-coding RNAs, and so I need to make sure they are non-coding. Among other methods, I tried running my sequences against Pfam using Hmmscan using E as 0.001 and domE 0.001. The results are what's confusing me: every single hit says the sequence hits with a collagen domain. I even blasted one of them and it had nothing to do with collagen, though it was indeed coding. I've already tested it with two different datasets albeit from the same species. Does anyone know what's happening?

RNA-seq lncRNA • 273 views
ADD COMMENT
0
Entering edit mode

I'm trying to do a analysis of long non-coding RNAs

Pfam database is for coding proteins/protein families. Not clear what you are doing above.

ADD REPLY
0
Entering edit mode

My sequences are potential long non-coding and literature says to use protein databases to confirm that they do not match with any protein families. What did seem strange was that every single protein coding transcript matched with collagen.

ADD REPLY
1
Entering edit mode
19 days ago
Mensur Dlakic ★ 28k

My recollection is that collagen sequences have many repetitive residues, and those are Ds and Es if I remember correctly. If your putative ncRNAs end up with stronger repetitive bias after translation, you may get non-specific collagen matches. Many "regular" proteins end up matching collagen for the same reason even though they are not related to it.

ADD COMMENT

Login before adding your answer.

Traffic: 1327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6