Question

Detection of ncRNA in RNAseq transcript expression Data: lncRNA vs lincRNA

0

Entering edit mode

5.8 years ago

manninm • 0

Hello, I have been asked to identify and quantify the number of ncRNA transcripts in some datasets from my lab. FYI, the data is Poly A enriched rna-seq (we aren't not doing any analysis, just attempting to document what exists in the dataset) I found some very helpful hints here at C: Non-coding RNA detection, suggesting that I awk the 3 column for "ncRNA". I am using Homo_sapiens.GRCh38.86.gtf. Further inspection of my gtf file shows that lincRNA are the only strings returned if I grep "ncRNA", and aren't in the feature column, but in the biotype column. Would I be remiss in assuming that these are the same thing, and searching for this string will be representative of ALL lncRNA? Any constructive criticism on this approach would be appreciated. Thanks!

RNA-Seq sequencing • 1.4k views

ADD COMMENT • link 5.8 years ago by manninm • 0

0

Entering edit mode

Further investigation into my GTF file as revealed that lincRNAs are not the only ncRNA subtype included in the gene_subtype column. Using rtracklayer for R, I imported my GTF file, transformed into a dataframe, filter by type=gene. Vectorized the gene_biotype column, greped 'RNA' and ran unique. The number of returned values are below.

[1] "miRNA"                         "lincRNA"
 [3] "snRNA"                         "misc_RNA"
 [5] "snoRNA"                        "scaRNA"
 [7] "rRNA"                          "3prime_overlapping_ncRNA"
 [9] "bidirectional_promoter_lncRNA" "scRNA"
[11] "sRNA"                          "vaultRNA"
[13] "macro_lncRNA"                  "Mt_tRNA"
[15] "Mt_rRNA"

ADD REPLY • link updated 5.8 years ago by GenoMax 153k • written 5.8 years ago by manninm • 0

0

Entering edit mode

If you truly need all non-coding RNA then use all ot hese:

Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.

ADD REPLY • link 5.8 years ago by GenoMax 153k

score 2 · Accepted Answer · 2019-11-04

LincRNA means "long intergenic non-coding RNA", whereas lncRNA is just "long non-coding RNA". Many people use them interchangeably. The only scenario I can think of where there might be a lncRNA that's not a lincRNA would be a lncRNA that overlaps a coding gene on the antisense strand, but I have a feeling in the GTF annotation that would still just be labeled as a lincRNA, if that's the term they're using. I think there may also be non-coding isoforms of coding genes, but I don't know how those are defined in the annotation. I don't think those are all that common, and the studies that found them may have just found transcriptional noise (polymerase run-on or something like that). Because your library is poly-A selected, I wouldn't expect you to be able to detect any types of ncRNA other than lncRNA, and there's still uncertainty about to what extent lncRNA are polyadenylated.