isolating in frame reads from a file
1
0
Entering edit mode
7.1 years ago
Sara ▴ 260

I have Ribo-seq data (also aligned them) and trying to isolate the in-frame reads. do you know how I can do that?

RNA-Seq • 1.4k views
ADD COMMENT
0
Entering edit mode

just curious: what is an in-frame read ? I know what is a translation frame but I wonder how a read can be 'in-frame' ?

ADD REPLY
1
Entering edit mode

When dealing with RIBO-seq data, we don't take the whole read in account. Only 1 position, which usually correspond to the first nucleotide in the P-site of the Ribosome. So, as you are working with only 1 position to characterize your reads (RPF: ribosome protected fragment), you can see at the nucleotide resolution which read is in the F1, F2 or F3 of the CDSes. F1 corresponding to the ATG frame, the in-frame reads.

ADD REPLY
0
Entering edit mode

thanks !

ADD REPLY
0
Entering edit mode
7.1 years ago
glihm ▴ 660

Hello Sara!

At this moment, RIBO-seq tools allow you to work on your data to study the Kmer size repartition, the periodicity etc... but no tools are available to directly deal with data.

I am currently finishing a library to do all this stuff for people working with Ribo-seq (I am working for 2 years essentially with RIBO-seq)! :)

So, at this time, you'll need to implement something to do it. Basically, to extract in-frame reads you have to:

  1. Global trimming (adapter, quality, rRNA)

  2. Reduce the reads to only 1 position (the P-site of the Ribosome) or an other site depending on your study. Be careful to choose this site accordingly with the enzyme you used to digest the RPFs. Also, depending on your organism, you should count from the 5' or the 3' of your RPFs. I recommend you to use some already written tools to detect P-site position from your filtered reads.

  3. Getting the in frame position from a GFF3 file (CDSes only, and I strongly suggest to select transcripts, depending on the organism you are working with). If you are working with proka or Yeast, should be easy. With human or complex genomes, you should take a look to the most expressed transcripts and/or APPRIS classification of transcripts (or support level).

  4. Extracting only the reads that the reduced position matches the ATG frame for each transcript

So, this is lot's of work depending the organism (I have started writing the library 1 year ago). I don't have the library ready to be released (it should be ready for November - December), but if you need help to implement this feature feel free to ask! But with the 4 points I mentioned you should be able to do it easily using samtools library.

Feel free to ask me more details if you want to implement it.

Hope this helps!

Best, glihm

ADD COMMENT

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6