smallRNA sequencing workflow and Database usage
2
4
Entering edit mode
6.1 years ago

Dear Biostars, I'm a new PhD student and my main project is the identification/Differential Expression of piRNAs in Human colorectal cancer (CRC) cell lines with smallRNA-seq. I would like to ask about the bioinformatic analysis workflow and which databases/libraries for different small RNAs I have found various strategies regarding the identification of piRNAs and I will follow this kind of workflow:

  1. Use bowtie for sequence alignment to map reads to the genome (hg38 iGenome)
  2. annotate reads to rRNA library (?which database)
  3. annotate remaining reads to mature and hairpin libraries (miRNAs) (miRBase)

4a

  1. annotate remaining reads to tRNA library (GtRNAdb)
  2. annotate remaining reads to Rfam library (snRNA, snoRNA, lncRNA)
  3. annotate remaining reads to piRNA database (piRBase or piRNABank)

4b

  1. annotate remaining reads to tRNA library (GtRNAdb)
  2. annotate remaining reads to Rfam library (snRNA,snoRNA,lncRNA)
  3. annotate remaining reads to piRNA cluster database (piRNAclusterdb)

Three publications have shown that piRNA could derive also from tRNAs: 1) The biogenesis pathway of tRNA-derived piRNAs in Bombyx germ cells 2) The human Piwi protein Hiwi2 associates with tRNA-derived piRNAs in somatic cells 3) tRNA processing defects induce replication stress and Chk2-dependent disruption of piRNA transcription.

Thus, I should change the order of annotation libraries:

alternative 4a

  1. annotate remaining reads to Rfam library (snRNA, snoRNA, lncRNA)
  2. annotate remaining reads to piRNA database (piRBase or piRNABank)
  3. annotate remaining reads to tRNA library (GtRNAdb)

alternative 4b

  1. annotate remaining reads to Rfam library (snRNA, snoRNA, lncRNA)
  2. annotate remaining reads to piRNA cluster database (piRNAclusterdb)
  3. annotate remaining reads to tRNA library (GtRNAdb)

Also, I examine the possibility of re-mapping reads against transposon library (RepBase) found in piRNADBs in order to "filter" annotated results of piRBase/piRNABank libraries to a more robust final piRNAs dataset.

A) Which database/library should I use for the annotation of rRNA? I have read these posts: cannot find biostar stackexchange Human Non Coding Rrna Sequences For Download SEQanswers but it is not clear to me which one is the best option for annotation library... pardon my naiveness.

B) Next step would be to normalize counts, I have read about /RPM/RPKM/TPM and other types of normalization but which one should be more robust to use for small-RNA counts and which one should be used for piRNA clusters? Because of the lenght of piRNA clusters ~(5k-60k bp) i think I should use TPM normalization to show the relative abundance in different cell lines, is this correct?

C) Regarding mature piRNAs for DE analysis should I follow the workflows of edgeR, DEseq2 or limma voom?

D) Does this workflow seems robust or it needs corrections?

Thank you for your time and consideration

Konstantinos

sRNA-Seq piRNA annotation normalization DEseq2 • 3.9k views
ADD COMMENT
2
Entering edit mode

In regards to C)

One thing to keep in mind is that piRNA are thought to be one of the most abundant class of small RNA in animals. As another user commented many of these annotations could actually be degradation products of other RNA or simply misclassification. On top of that there may even be additional piRNA in your dataset that are not annotated.

For DE analysis any of the packages you mentioned should work.

ADD REPLY
0
Entering edit mode

I'm concerned about the misclassification problem and that's why I'd prefer to use experimentally validated mature piRNAs from piRBase with different methods. About novel piRNAs, there are some prediction tools that could help to find that kind of additional information. If I would be able to find the same sequences in all my samples then could I use them for DE analysis?

ADD REPLY
1
Entering edit mode

RNACentral would be a good option for reference data.

ADD REPLY
0
Entering edit mode

Do you think it is better to use it as a reference for all libraries or just for rRNA?

There is a problem with piRNA databases regarding the annotation of some piRNA sequences. In this publication, they showed that there are sequences classified as piRNAs in piRNAbank that match to ncRNAs (rRNAs, tRNAs, YRNAs, snRNAs, and snoRNAs). I believe that if I utilise a database with information about the experimental method used to find piRNAs then I could filter this database and keep piRNAs validated with 2 different methods. Fortunately, the updated form of piRBase has this kind of information.

Thank you for your time

ADD REPLY
4
Entering edit mode
6.1 years ago
A. Domingues ★ 2.7k

Welcome to the wonderfully strange world of piRNAs :) I will first urge to read two of my previous answers on the subject:

A: piRNA target-interaction database? A: small RNA analysis pipeline!

TLDR; if you are novice, use piPipes. Very easy to set-up, specially if you are working with human data, and it has been developed by a lab with a good standing in the field. It will do pretty much everything you have listed with the exception of using the piRNA databases, which, and to my best knowledge, most high-profile, experienced, piRNA groups don't use at all. At least those working in fly/mouse/drosophila/zebrafish. And why you ask? Because piRNAs are quite flexible, and in some species, are not transcribed from a defined clusters/transcription units, unlike miRNAs for instance. So I don't see how useful those databases are. So when analysing piRNAs ignore what you would do for miRNAs (I am not even sure what "mature piRNAs" are).

If you still want to create/use your own pipeline, read the manual provided in piPipes - it will give a very good insight into the steps.You can also have a look at these tools:

http://www.smallrnagroup.uni-mainz.de/software.html

Second and more important. When looking at somatic piRNAs, extra care should be taken to make sure these are "real" piRNAs. As I mentioned, piRNAs are not (usually*) processed from particular clusters and are not very well defined in terms of sequence composition/size. So what one might think is a piRNA could very well be a degradation product. This is true specially if one is analyzing total small RNA libraries in somatic cells. Ideally one would be identifying them with an IP for an argonaut that processes them. In the germline, while the IP is important, is less critical because in these tissues piRNAs are the most abundant small RNA species. For instance, the 3 papers you reference, two are done in the germline and one IPs an argonaut. Also in Drosophila, some piRNAs are produced in clusters, which also appears to be case in human for _some_ piRNAs (ref). I will also add, that somatic piRNAs are still quite controversial, at least for some groups. You better make sure those are really piRNAs. I am agnostic on the matter, so as long as the evidence is strong I will accept it.

*everything is an exception in the piRNA world depending on which species is used as a reference for comparison.

If you something is not clear just ask and I will try my best to explain.

ADD COMMENT
1
Entering edit mode

Another reason why using databases of piRNAs might not be a great idea:

Non-coding RNA fragments account for the majority of annotated piRNAs expressed in somatic non-gonadal tissues. https://www.ncbi.nlm.nih.gov/pubmed/30271890

ADD REPLY
0
Entering edit mode

Yes, I'm aware of this publication. Currently, I'm following a workflow using the databases but I'll try to follow the suggestions you proposed to use another workflow without the databases. Happy New Year!

ADD REPLY
2
Entering edit mode
6.1 years ago
ti&te ▴ 40

Dear Konstantinos,

I would suggest you to take a look at sRNAtoolbox. The tool exist as virtualmachine, standalone or online server. The tools-package is bassicaly thought for the analysis of miRNA but the pipeline annotate also other ncRNA . sRNAde (DE analysis) include edgeR, DEseq (and NOIseq on server). I havent analyzed my data for piRNA and other lnRNA, but I am shure if you will find this tool useful the developer team will help you. All the best with the analysis and please keep us updated with the optimal pipeline. We have prepared smallRNA libraries without size selection to keep as much as information as possible and as I see even different parameters in adaptor trimming have the effect ont he final results.

ADD COMMENT
2
Entering edit mode

It is customary to provide a link when recommending a software package. I assume you are referring to sRNAtoolbox found here?

ADD REPLY

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6