I have an RNAseq data (CLIP-seq) and want to find out RNA-binding partner to my proteins of interest. The data is PE 2X50. Which aligner will be most suitable to detect RNAs in my sequencing data?
Thanks.
I have an RNAseq data (CLIP-seq) and want to find out RNA-binding partner to my proteins of interest. The data is PE 2X50. Which aligner will be most suitable to detect RNAs in my sequencing data?
Thanks.
I don't see why a spliced aligner (e.g. TopHat, but see recent question Is tophat the only mapper to consider for RNA-seq data?) wouldn't work for CLIP-seq/HITS-CLIP. A quick search for papers using the technique (1, 2, 3) shows they are not consistent (BLAT, a custom aligner, or MosaikAligner). Can't hurt to try different aligners, I suppose.
There may be other things to be careful about besides aligner choice. For example, from this protocol, it looks like there's a digestion step involved. I'm not sure if this means the fragments you end up sequencing are necessarily small . . . but depending on the experimental protocol used you may need to be careful about insert sizes for the PE reads and/or trimming adapter sequence.
If your protocol includes UV cross link and can generate T to C mutations in some of the reads, but not all of them, one possibility is to assemble the read clusters first, then align the region under the peak of reads to the reference genome. Pinball does just that:
Pinball is an alignment-free ChIP-seq and HITS-CLIP analysis tool:
https://github.com/avilella/pinball/blob/master/INSTALL
If you want to skip installation and set up, you can try the virtual machine here:
ftp://ftp.ebi.ac.uk/pub/databases/ensembl/avilella/pinball/PinballVM.1.0.4.ova
The installation procedure of the virtual machine is the same as described here:
http://www.ensembl.org/info/data/virtual_machine.html
Depending on your read length, you may want to tweak the --error-rate parameter, to allow reads with T/C or other mutations to still align with mismatches. For example, if you have 36bp reads, require a 2/3 of the read length for overlap=24bp, and want to allow 1 mismatch every 24bp, you can set --error-rate=0.042 (>1/24).
Hope it helps.
I would take a look at this paper that explains the data analysis and some considerations to make in order to find single-bp resolution binding sites: Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data by Zhang & Darnell.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi @Mathew Bunj, is your CLIP-seq dataset of the kind that generates mutations (e.g. UV light protocol) with respect to the reference genome?
The procedure include UV cross link and yes it may be possible it can generate soem mutations particularly T to C. Do You have any suggestion?