Question

Ribo and RNA seq analysis

0

Entering edit mode

9 months ago

Manu Ayllon • 0

Dear Biostars community, I am working for a project in which I have to generate features from a RNA sequence to classify if it is gonna translate or not.

I though about adding Ribosome Foot Print (RFP) and Expression levels. For these features, I built a huge Ribo-seq and RNA-seq dataset to cover as many RNA-sequences as possible, and I preprocessed them. My RNA sequences are in Fasta format, I indexed them, and here comes my question:

Should I first align the Ribo-seq and RNA-seq datasets to the reference Genome with its annotation (taken from GENCODE), and then align the aligned sequences to my indexed RNA sequences? or this is just a waste of time so I can directly align Ribo-seq and RNA-seq datasets to my indexed RNA sequences?

Thank you for your time.

RNA-seq genomics Ribo-seq • 591 views

ADD COMMENT • link updated 8 months ago by Jack Tierney ▴ 410 • written 9 months ago by Manu Ayllon • 0

score 0 · Answer 1 · 2024-03-19

0

Entering edit mode

8 months ago

Manu Ayllon • 0

I got it making use of blastn, creating first a database with the RNA sequences and doing blast with the Ribo-seq sequences as query. The process is computationally expensive, but splitting the big dataset into smaller datasets works surprisingly well. This is the blastn command line:

blastn -query "$file" -db db/dataset -evalue 1e-05 -perc_identity 60 -max_target_seqs 10 -num_threads 14 -outfmt "6 sseqid" |

For the full code you can contact me. I also would like to know if this could be relevant for the main question. Thanks!

ADD COMMENT • link 8 months ago by Manu Ayllon • 0

0

Entering edit mode

Hi Manu Ayllon,

It is difficult to advise as I am unclear on the goal of your project. I will tell you what I understand and you can correct me where needed.

The overall goal is to develop a classifier that will accurately determine if a given RNA is translated or not. Meaning translated but not necessarily encoding a stable protein product.

To develop this classifier you want to take publicly available Ribo-Seq data and their paired RNA-Seq to obtain a set of translated RNA's on which you could start to train your model using sequence features (?).

Are you just looking at human data? I cannot tell from your blastn command. What kinds of RNAs are you investigating? I am unclear about the role of blastn and why you wouldn't just use a reference annotation. I am happy to help, just need a bit more info!

ADD REPLY • link 8 months ago by Jack Tierney ▴ 410