Question

Is Aligning While Considering Splicing The Same As Alignment Against The Full Set Of Transcripts

3

Entering edit mode

14.4 years ago

Doctoroots ▴ 810

Hi All, a general question: is aligning using a splicing considering aligner (e.g TopHat) to a reference genome the same as aligning with a non splicing considerate aligner (e.g bwa) but against the full set of transcripts for that genome?

if not, what are the advantages / disadvantages of each?

splicing alignment transcriptome • 3.5k views

ADD COMMENT • link updated 14.4 years ago by Michael 55k • written 14.4 years ago by Doctoroots ▴ 810

2

Entering edit mode

Just a comment--cufflinks is not an aligner. The most common aligner to use in conjunction with cufflinks is tophat.

ADD REPLY • link 14.4 years ago by Sean Davis 27k

1

Entering edit mode

to clarify our terminology a bit Cufflinks is a "transcript analysis tool" that is designed to work with the "splice junction inference tool" Tophat, which itself relies on the decidedly non-splicey short-read mapper Bowtie.

True splice-aware aligners like GMAP, CAP3, BLAT do not work well with <100bp reads.

ADD REPLY • link 14.4 years ago by Jeremy Leipzig 23k

0

Entering edit mode

thanks for the comment, changed.

ADD REPLY • link 14.4 years ago by Doctoroots ▴ 810

0

Entering edit mode

... but see GSNAP which "can detect splicing, multiple mismatches, long indels and combinations thereof, up to a user-specified point total, limited to a single splice or indel per read, provided the read (or parts of the read on each end of the indel or splice) has a consecutive stretch of 14 nt that match the reference sequence. (http://bioinformatics.oxfordjournals.org/content/26/7/873.full)

ADD REPLY • link 14.4 years ago by Malcolm.Cook ★ 1.5k

0

Entering edit mode

One downside of GSNAP (currently, if I recall) is that it does not accept fastq, but only fasta. I do not know how much of a difference it makes as reads are quite a bit longer making the likelihood of mapping to the wrong location based on low-quality data more unlikely than with the shorter reads of a couple of years ago.

ADD REPLY • link 14.4 years ago by Sean Davis 27k

score 4 · Answer 1 · 2010-12-06

4

Entering edit mode

14.4 years ago

Michael 55k

Main disadvantage of the non-splicing aware alignments: you wouldn't be able to detect something that's not in your database of transcripts, e.g. new splice variants, unknown splice sites, or unknown transcripts, wrongly annotated splice sites, etc.. You simply would not see them. Aligning against the transcripts might be good for transcript quantification/ counting, but I would think both methods are best used to complement each other.

ADD COMMENT • link 14.4 years ago by Michael 55k

3

Entering edit mode

No - the point of TopHat, MapSplice and others is that you don't need to supply them with a reference file; they can detect splice sites "de novo". The fact that you can give (some of them) a reference file and quantify what's in there is a different matter.

ADD REPLY • link 14.4 years ago by Mikael Huss 4.8k

1

Entering edit mode

What do you mean with spl. considerate aligners? If you take TopHat (splice junction finder) and Cufflinks (transcript assembly) for example, these are algorithms relying on an aligner (bowtie) One provides a reference sequence for the alignment, but neither tool needs extra annotations.

ADD REPLY • link 14.4 years ago by Michael 55k

0

Entering edit mode

thanks for the answer Michael, about the unknown / novel splice variants - since most of the splicing considerate aligners i know require an additional reference file containing the splice sites / exons - it can also miss unknown splice variants no?

ADD REPLY • link 14.4 years ago by Doctoroots ▴ 810

0

Entering edit mode

you are right, for some reason i was under the impression that TopHat needed a splice junctions reference file

ADD REPLY • link 14.4 years ago by Doctoroots ▴ 810