Ribo and RNA seq analysis
1
0
Entering edit mode
9 months ago

Dear Biostars community, I am working for a project in which I have to generate features from a RNA sequence to classify if it is gonna translate or not.

I though about adding Ribosome Foot Print (RFP) and Expression levels. For these features, I built a huge Ribo-seq and RNA-seq dataset to cover as many RNA-sequences as possible, and I preprocessed them. My RNA sequences are in Fasta format, I indexed them, and here comes my question:

Should I first align the Ribo-seq and RNA-seq datasets to the reference Genome with its annotation (taken from GENCODE), and then align the aligned sequences to my indexed RNA sequences? or this is just a waste of time so I can directly align Ribo-seq and RNA-seq datasets to my indexed RNA sequences?

Thank you for your time.

RNA-seq genomics Ribo-seq • 590 views
ADD COMMENT
0
Entering edit mode
8 months ago

I got it making use of blastn, creating first a database with the RNA sequences and doing blast with the Ribo-seq sequences as query. The process is computationally expensive, but splitting the big dataset into smaller datasets works surprisingly well. This is the blastn command line:

blastn -query "$file" -db db/dataset -evalue 1e-05 -perc_identity 60 -max_target_seqs 10 -num_threads 14 -outfmt "6 sseqid" |

For the full code you can contact me. I also would like to know if this could be relevant for the main question. Thanks!

ADD COMMENT
0
Entering edit mode

Hi Manu Ayllon,

It is difficult to advise as I am unclear on the goal of your project. I will tell you what I understand and you can correct me where needed.

The overall goal is to develop a classifier that will accurately determine if a given RNA is translated or not. Meaning translated but not necessarily encoding a stable protein product.

To develop this classifier you want to take publicly available Ribo-Seq data and their paired RNA-Seq to obtain a set of translated RNA's on which you could start to train your model using sequence features (?).

Are you just looking at human data? I cannot tell from your blastn command. What kinds of RNAs are you investigating? I am unclear about the role of blastn and why you wouldn't just use a reference annotation. I am happy to help, just need a bit more info!

ADD REPLY

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6