Hi,
I have Illumina 100 bp paired end RNA-Seq data from a non-model species. I mapped it to the closely related genome available and I used STAR to do this task. I mainly did this to see if I could use the genome of the organism for a genome guided assembly. It turns out that I got an overall alignment rate of 0.93%. I used default parameters for this, however working around the parameters only increased the results to 1-2%. The species I'm working with is a cephalopod.
I'm not really interested in increasing the mapping rate at this point (since this was mainly for exploratory analysis). However I wanted to know what other downstream analysis can I do on the reads that actually did map (I'm assuming these would be tRNAs, histones etc). Basically I want to be able to make plots for the sequences that did align, but not sure what programs I can use to represent my data. Any ideas would be great :)
I also wanted to know if others have done a similar analysis and got similar results? What conclusions did you derive from this sort of analysis?
Guided assembly from RNA-seq? for what?, However, if the mapping percentage to related specie is low, you can perform a de novo assembly (transcripts), predict orfs and blast them to predict functions... etc etc. I think that plot your actual results does not have any sense because because they may be related to sequencing noise, however you can try using the .sam file and htseq or even samtools -view.
I think the first thing to do when you get a very low mapping percentage, is to take some of your reads and do blast against the ncbi nr database, and see what kind of organisms you get hits to. Your reads may not be what you expected them to be.
Cant up vote what mastal511 said enough. You literally have less than 1% idea of what your data is. For all you know it could be contaminated.
On a side note - If there is no reference genome, why don't you make an attempt at de-novo assembly and try to get it published?