Question

Identification of APAs (alternative polyadenylation) from standard RNA-seq

0

Entering edit mode

8.0 years ago

aln ▴ 320

Dear all,

I would like to ask if any of you had experience getting the information about alternative polyadenylation of the genes (produces more than one transcript from a single gene similar to alternative splicing) from standard RNA-seq data. The most relevant article I found is http://www.nature.com/articles/ncomms6274, where authors developed tool DaPars for the de novo identification of dynamic APAs (between two conditions). As input format one should provide BedGraph, converted from BAM file from RNA-seq alignment tool such as TopHat. But in my case I want to compare two conditions from different studies, so I should probably correct for batch-effect before submitting anything to DaPars...

So, my question are:

If anyone used DaPars before and got desirable results?
If anyone knows another tools/ways of solving the same problem? (Ideally, if the problem new to me I try to compare different solution for better understanding)

Thanks in advance!

rna-seq alternative polyadenylation • 2.4k views

ADD COMMENT • link updated 8.0 years ago by i.sudbery 20k • written 8.0 years ago by aln ▴ 320

score 2 · Answer 1 · 2016-11-22

We have used DaPars and get results out of it. However we never get very many. I don't know if this is because we only have mild effects, or it DaPars is not very sensitive.

We have also approached this problem using DEXSeq:

Divide your gene into exons, or parts of exons that can be used to build the any transcript in the gene. i.e. if you have the following three transcripts in a gene:
```
|****|----|****|----|********|-----|********|
|****|-------------------|***|-----|****|
|****|----|*******|
```
you would get the following "chunks"
```
   1         2   3     4    5          6   7
|****|----|****|**|--|****|***|-----|****|***|
```
we use the gtf2gtf tool from CGAT to do this, being careful to remove retained intron chunks. Given that so many APAs are just after the end of the CDS, you might also want to seperate out the CDS and UTR into seperate chunks.
If you have unstranded RNAseq, you'll also want to remove chunks that overlap on alternate strands because differential expression of an overlapping gene is likely to look like APA. We use the following script to do that:
```
  bedtools subtract -a infile.gtf -b infile.gtf -S > tmp1;

  bedtools merge -i <( sort -k1,1 -k4,4n tmp1)  -c 6 -o count -d -2
| awk '$4>1'
| bedtools subtract -a tmp1 -b stdin
| gzip > outfile.gtf.gz;

  rm tmp1;
```
Use featureCounts to count reads mapping to each chunk, being careful to include reads that map to multiple chunks from the same gene.
Do differential exon usage analysis using DEXSeq.
Identify which of your chunks correspond to the ends of a transcript. In the above example , this would be 3, 6 and 7. Look which of your differential exons by DEXSeq are amoungst this list of "last exon chunks".

Its a bit fiddly, and doesn't sound like it should work, but we've found we find more this way than we do with DaPars.