small rna length distribution
2
1
Entering edit mode
5.4 years ago
bioinfo_ga ▴ 70

I am performing small rna analysis and using srna-workbench for data trimming (Adaptor trimming and length filtering). After length trimming (16-40 bases) i am observing the distict reads are maximum of 32 bp whereas for small rna peak is observed at 18-22bp. What could be the reason?

mirna • 3.7k views
ADD COMMENT
0
Entering edit mode

Hi ,

How do your performed your trimming step with list of adaptor sequence ?

What tool do you used ?

Are you sure you can t have 32 bp RNA possible ? I was thinking in Piwi RNA for exemple .

Best

ADD REPLY
0
Entering edit mode

I have used small rna workbench tool and used inbuilt adapter (LMN_3). I went through published papers of which almost all have reported the same trend (18-22bp). I checked for piwi RNA also but it is <0.02% in my data.

ADD REPLY
0
Entering edit mode

ok may be just check with fastq if you have a correct nuclear acid repartition in your trimmed reads.

ADD REPLY
0
Entering edit mode

You would not expect very much piRNA unless you are deriving your sequencing samples from the core reproductive organs.*

*Assuming you aren't researching plants

ADD REPLY
0
Entering edit mode

Hello bioinfo_ga,

What's your experience with srna-workbench? I am trying to analyze my plant srna seq data with it. But I don't know how to install it in my windows system :(. Could you share your experience with me? Appreciate sincerely.

Wei Xu

ADD REPLY
2
Entering edit mode
5.4 years ago

Depending on the protocol used (which you did not mention so it is hard to tell) you could have degradation products remaining from turn-over of endogenous mRNA.

However, a smallRNA peak at 18-22bp makes sense... at least in animals as this would correspond to high levels of miRNA and/or siRNA. Generally smallRNA have the following ballpark size ranges (you should investigate what is expected for your organism of interest):

  • 21-22 nt siRNA
  • 21-23 nt miRNA
  • 24-31 nt piRNA

The smaller sizes could be the result of over-trimming, the mRNA byproducts as first mentioned, and/or minor degradation of your small reads. If the distribution of your read lengths (i.e. molecule lengths) matches what has been previously observed in the literature I wouldn't worry too much.

Here is an example for Drosophila from this paper: smallRNA profile Dros

ADD COMMENT
0
Entering edit mode
2.9 years ago
davmlaw ▴ 130

Hi, as well as plotting the length, you can also check the biotype (ie species of RNA) of overlapping genes

I've written a Python library called PyReference that includes a command to do this:

# Install
python3 -m pip install pyreference
# Process GTF into a JSON file
pyreference_gff_to_json.py --gtf gencode.hg38.gtf
# Create a setup file (see website for details)
# Run against your BAM file
pyreference_biotype.py ${BAM_FILE}

This generates a graph:

bam biotype read length graph

As well as a CSV of read counts per biotype for each length

ADD COMMENT

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6