Question

criterion for filtering lncRNA

0

Entering edit mode

9.6 years ago

dineshtripathy9658 ▴ 10

I was going through some papers on criterion for filtering lcnrna. but I do not get answer to some questions

why bp length is set to <= 200?
orf length >120
why blast against swissprot?
why evalue for blast is only <0.001?

here are some such papers : http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0043047

http://genome.cshlp.org/content/13/6b/1301.full

sequencing • 2.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by dineshtripathy9658 ▴ 10

Ram · Answer 1 · 2015-04-06

2

Entering edit mode

9.6 years ago

vahapel ▴ 210

Hi,

200 base length and ORF length (>100) is not exact cut off value for lncRNAs, It is just because that some lncRNAs have potential to encode micropeptides and to eliminate such a probability it serve a starting point for lncRNA identification in RNA-Seq dataset. Swisprot and UniRef100 is a large protein collection database and all potential lncRNAs must be aligned to the protein sequences deposited in these database, can be sure whether it encode a protein or not.

These can not be enough, you also evaluate your candidate lncRNAs to be sure it is bona fide lncRNAs using protein coding potential tools such as; Coding Potential Calculator (CPC) (Kong et al., 2007), PhyloCSF (Lin et al., 2011), and the Coding-Potential Assessment Tool (CPAT) (Wang et al., 2013).

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by vahapel ▴ 210

0

Entering edit mode

If you goal is to just confirm that your sequence does not match a coding gene, rather than annotation, I would recommend not using SwissProt or the UniRef databases and instead using a larger database (uniprot/refseq). SwissProt and UniRef are very well curated, but are far from complete and are very biased in terms of species.