Hi,
I have RNA-seq data (PE 2x300) for non-model organisms. I want of aligning them against the assembled genome using BLAT for the identification of introns. Since BLAT can not align paired-end reads. I want to convert them into single-end reads by merging the overlapping R1 and R2.
I would like to know if it makes sense to merge R1 and R2 into one read (if overlaps) or maybe I should work with R1 and R2 separately.
My Best Regards, Prasoon
Hi Nicolas,
Thanks for your reply. I am working on the genome in which possibly non-canonical (non-GT-AG) splice sites are predominant. That's why I don't want to align with STAR. I don't know about the kallisto aligner if it is suitable for my case.
STAR handles well non-canonical splice sites. If you want to reduce penalty of non-canonical splice sites by changing
--scoreGapNoncan
to -4 or even 0 (default is -8). You may also change--scoreGapGCAG
and--scoreGapATAC
to 0. Be aware that you may increase false positive splice-reads. I suggest you to read STAR manual : https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf . All parameters are well explained :)RNA-seq reads do not contain the non-canonical splice site sequences that are located in introns. That issue is therefore moot if you are using option B with a transcriptome from your genome.
RNA-seq often consists of intronic region or unspliced reads. I was following the discussing here in Why Are There Many Rna-Seq Hits To Intronic Regions?