I have designed a simple RNA-seq aligner, the program takes tow files, first is a reads fastq file and the second is a transcriptome fasta file. the goal of this simple naive aligner is to allow whole exons to be deleted with no penalty
I want to compare the results with output from Tophat (or any other RNA-seq aligner ?) and I know that " TopHat uses a multistep alignment process which starts by aligning reads to transcriptome if genomic annotation is available." RNA-Seq Data Analysis
so my question is, Is it possible - for testing purposes - to align reads to transcriptome only using tophat?
FYI just published today : http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html
thanks a lot guys :))
hmmm Im stuck again, so let me explain the whole situation and I really hope u can help !
so I have a transcriptome fasta file contains sequences of exons, to simplify the testing i generated the reads by simply copying sequence from an end of an exon with a start of another exon -- just to check the functionality of my aligner to skip exons --
here is a simple reads
@4-33330 ATTTGATGTTGGTGGAGTTCTCCAAAATATTTATGCTATTGTAGATCCTAACCATGTTGTTGGTGATGGTAAGAAGTTCTACGATTTCCCCGAAAAGCCTGAGACTTTATTGTTCCGGTCACATAATCGACTCTCTTAGGCATTCAATTT + JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
the length of the reads is 150 and I assign quality score to J, again for simplifying the process
then I did the following:
but I got 0 aligned reads :/
Transcriptome alignment like that only makes sense if the exons skipping events result in annotated isoforms. If not you'll need to do the normal genomic alignment step in addition.
thanks Devon, but if I do the normal genomic alignment step again, that means the aligner would skip not only exons but also introns, right ? I found this tutorial to align to a trascriptome using BWA, but the results I got for example 75M75S instead of 75M1832N75M; instead of skipped regions I got hard/soft clipping :/
BWA isn't splice aware, you need to use a splicing aware aligner in this particular case. For the genomic alignment part, at that stage the aligner has no conception of exons or introns, it just tries to align the read in chunks. After that, it looks for novel splice junctions (e.g., from exon skipping) and finally realigns things accordingly. This is often referred to as "two-pass alignment" and you can do it (much faster) with STAR too if you prefer.
thanks again Devon ... really appreciate it :))