Question

Which Aligner Is Best Suited For Clip-Seq Data?

2

Entering edit mode

12.3 years ago

Mathew Bunj ▴ 40

I have an RNAseq data (CLIP-seq) and want to find out RNA-binding partner to my proteins of interest. The data is PE 2X50. Which aligner will be most suitable to detect RNAs in my sequencing data?

Thanks.

alignment rna sequencing • 7.2k views

ADD COMMENT • link updated 12.3 years ago by 14134125465346445 ★ 3.6k • written 12.3 years ago by Mathew Bunj ▴ 40

0

Entering edit mode

Hi @Mathew Bunj, is your CLIP-seq dataset of the kind that generates mutations (e.g. UV light protocol) with respect to the reference genome?

ADD REPLY • link 12.3 years ago by 14134125465346445 ★ 3.6k

1

Entering edit mode

The procedure include UV cross link and yes it may be possible it can generate soem mutations particularly T to C. Do You have any suggestion?

ADD REPLY • link 12.3 years ago by Mathew Bunj ▴ 40

score 4 · Answer 1 · 2013-01-10

I don't see why a spliced aligner (e.g. TopHat, but see recent question Is tophat the only mapper to consider for RNA-seq data?) wouldn't work for CLIP-seq/HITS-CLIP. A quick search for papers using the technique (1, 2, 3) shows they are not consistent (BLAT, a custom aligner, or MosaikAligner). Can't hurt to try different aligners, I suppose.

There may be other things to be careful about besides aligner choice. For example, from this protocol, it looks like there's a digestion step involved. I'm not sure if this means the fragments you end up sequencing are necessarily small . . . but depending on the experimental protocol used you may need to be careful about insert sizes for the PE reads and/or trimming adapter sequence.

score 3 · Answer 2 · 2013-01-18

3

Entering edit mode

12.3 years ago

14134125465346445 ★ 3.6k

If your protocol includes UV cross link and can generate T to C mutations in some of the reads, but not all of them, one possibility is to assemble the read clusters first, then align the region under the peak of reads to the reference genome. Pinball does just that:

Pinball is an alignment-free ChIP-seq and HITS-CLIP analysis tool:
https://github.com/avilella/pinball/blob/master/INSTALL

If you want to skip installation and set up, you can try the virtual machine here:
ftp://ftp.ebi.ac.uk/pub/databases/ensembl/avilella/pinball/PinballVM.1.0.4.ova
The installation procedure of the virtual machine is the same as described here:
http://www.ensembl.org/info/data/virtual_machine.html

Depending on your read length, you may want to tweak the --error-rate parameter, to allow reads with T/C or other mutations to still align with mismatches. For example, if you have 36bp reads, require a 2/3 of the read length for overlap=24bp, and want to allow 1 mismatch every 24bp, you can set --error-rate=0.042 (>1/24).

Hope it helps.

ADD COMMENT • link 12.3 years ago by 14134125465346445 ★ 3.6k

1

Entering edit mode

could you maybe check permissions on your VM download link? I cant download it.

ADD REPLY • link 12.3 years ago by Ido Tamir 5.2k

0

Entering edit mode

Thanks for the heads up, I chmod'ed the files now.

ADD REPLY • link 12.3 years ago by 14134125465346445 ★ 3.6k

1

Entering edit mode

I installed the VM but it is giving me two errors- missing the checkout Variation Missing the checkout Funcgen

ADD REPLY • link 12.3 years ago by kanwarjag ★ 1.2k

score 1 · Answer 3 · 2013-01-14

1

Entering edit mode

12.3 years ago

UnivStudent ▴ 440

I would take a look at this paper that explains the data analysis and some considerations to make in order to find single-bp resolution binding sites: Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data by Zhang & Darnell.

ADD COMMENT • link 12.3 years ago by UnivStudent ▴ 440