criterion for filtering lncRNA
1
0
Entering edit mode
9.7 years ago

I was going through some papers on criterion for filtering lcnrna. but I do not get answer to some questions

  1. why bp length is set to <= 200?
  2. orf length >120
  3. why blast against swissprot?
  4. why evalue for blast is only <0.001?

here are some such papers : http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0043047

http://genome.cshlp.org/content/13/6b/1301.full

sequencing • 2.6k views
ADD COMMENT
2
Entering edit mode
9.7 years ago
vahapel ▴ 210

Hi,

200 base length and ORF length (>100) is not exact cut off value for lncRNAs, It is just because that some lncRNAs have potential to encode micropeptides and to eliminate such a probability it serve a starting point for lncRNA identification in RNA-Seq dataset. Swisprot and UniRef100 is a large protein collection database and all potential lncRNAs must be aligned to the protein sequences deposited in these database, can be sure whether it encode a protein or not.

These can not be enough, you also evaluate your candidate lncRNAs to be sure it is bona fide lncRNAs using protein coding potential tools such as; Coding Potential Calculator (CPC) (Kong et al., 2007), PhyloCSF (Lin et al., 2011), and the Coding-Potential Assessment Tool (CPAT) (Wang et al., 2013).

ADD COMMENT
0
Entering edit mode

If you goal is to just confirm that your sequence does not match a coding gene, rather than annotation, I would recommend not using SwissProt or the UniRef databases and instead using a larger database (uniprot/refseq). SwissProt and UniRef are very well curated, but are far from complete and are very biased in terms of species.

ADD REPLY
0
Entering edit mode

+1 for using a tool like CPAT

ADD REPLY

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6