Plot assigned / unassigned reads distance to 3'UTR
0
0
Entering edit mode
3.3 years ago
nlehmann ▴ 150

Hi,

I would like to plot RNA-seq reads distance to all 3'UTR in the reference GTF to see where the read end up. I could figure out something but I wonder if there is a better one, because the output file I get in roughly 80Go !! So not handy for plotting...

Here is what I got.

  1. Get featureCounts output BAM file: I do not filter this BAM file as I precisely want to keep both assigned and unassigned reads.
  2. Turn reference annotation GTF to BED file (which I have already pre-filtered to get only 3'UTR features), as well as the previous BAM to BED file. Here are tiny examples of both BED files:

reads.bed

chr1    30  40  b1  1   +
chr1    100 150 b4  1   +
chr1    270 350 b2  1   +
chr1    270 300 b3  1   -
chr1    600 700 b5  1   +
chr1    1000    4000    b6  1   -
chr1    1000    1030    b7  1   +

annotation.bed

chr1    10  20  a1  1   +
chr1    300 400 a2  1   +
chr1    350 400 a3  1   -
  1. Then I use bedtools closest to get the distance of each read to the closest 3'UTR: bedtools closest -a reads.bed -b annotation.bed -D b -s > distances.bed Here's a tiny example of distances.bed (distances are in the last column):

     chr1   30  40  b1  1   +   chr1    10  20  a1  1   +   11
     chr1   100 150 b4  1   +   chr1    10  20  a1  1   +   81
     chr1   270 350 b2  1   +   chr1    300 400 a2  1   +   0
     chr1   270 300 b3  1   -   chr1    350 400 a3  1   -   51
     chr1   600 700 b5  1   +   chr1    300 400 a2  1   +   201
     chr1   1000    4000    b6  1   -   chr1    350 400 a3  1   -   -601
     chr1   1000    1030    b7  1   +   chr1    300 400 a2  1   +   601
    
  2. Lastly, I load the distances.bed file in R and plot a density plot of all distances, which gives me something like this:

density plot

However, this plot have been made with only a sample of the file distances.bed, as the original one is far too big (80Go). I have a total of ~200 millions reads, so this much data points for plotting and lines in distances.bed. I am wondering if there is a better way to reach this ? Maybe using some normalized counts (see example here figure S1F)? But then I wouldn't know how to keep both assigned and unassigned reads.

Thanks for your help !

bedtools RNA-seq closest • 866 views
ADD COMMENT
0
Entering edit mode

what does distances.bed looks like please ?

ADD REPLY
0
Entering edit mode

I just edited my post with files example. Let me know if it's clear enough.

ADD REPLY

Login before adding your answer.

Traffic: 2253 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6