Hi All, a general question: is aligning using a splicing considering aligner (e.g TopHat) to a reference genome the same as aligning with a non splicing considerate aligner (e.g bwa) but against the full set of transcripts for that genome?
if not, what are the advantages / disadvantages of each?
Just a comment--cufflinks is not an aligner. The most common aligner to use in conjunction with cufflinks is tophat.
to clarify our terminology a bit Cufflinks is a "transcript analysis tool" that is designed to work with the "splice junction inference tool" Tophat, which itself relies on the decidedly non-splicey short-read mapper Bowtie.
True splice-aware aligners like GMAP, CAP3, BLAT do not work well with <100bp reads.
thanks for the comment, changed.
... but see GSNAP which "can detect splicing, multiple mismatches, long indels and combinations thereof, up to a user-specified point total, limited to a single splice or indel per read, provided the read (or parts of the read on each end of the indel or splice) has a consecutive stretch of 14 nt that match the reference sequence. (http://bioinformatics.oxfordjournals.org/content/26/7/873.full)
One downside of GSNAP (currently, if I recall) is that it does not accept fastq, but only fasta. I do not know how much of a difference it makes as reads are quite a bit longer making the likelihood of mapping to the wrong location based on low-quality data more unlikely than with the shorter reads of a couple of years ago.