Hi all,
some time ago we did mRNA seq at a seq facility for three non-model invertebrates. Samples were sequenced with novaseq6000, PE2x150, with ~40-55M, and everything should be strand-specific. The facility also did bioinformatics for us - de novo transcriptomes and a bunch of downstream analyses. Recently, I gained an interest in bioinformatics and wanted to try and learn to do some things myself. This brought me to checking how stranded is this data that we have, and I am getting a bit confusing results:
Situation 1 - species without the genome available:
Mapping the reads with salmon on the facility assembled transcriptome, with infer automatically option, gave back this: Mapping rate = 81.3943%; Automatically detected most likely library type as IU. I tried mapping the reads with kallisto as well, both as --fr, --rf, and non-stranded. For the first two, the mapping rate was around 40%, and for the non-stranded around 75%.
Situation 2 - two species for which we now got genomes, currently only assemblies:
Mapping the reads with STAR. Did this on Galaxy with an option to use my reference genome and create a temporary index. When I do infer_experiment on this, the results show non-strand specific libraries (around 45%+55%) for both species. I also checked the bam files in IGV, and can see that some positions along the genomes only have F1R2 orientation reads mapped, some have only F2R1 orientation, and some are completely mixed.
I guess my question is - what is it that I am missing or did wrong in this approach in determining how stranded is my data since it should be but I don't see it? And do you have any suggestions on what to try/change here? We are waiting for an answer from the facility about the library prep and the exact assembly parameters, but it would be useful to know how to check it ourselves.
Thanks for any input!
Hi Magdalena,
I can't really comment on what it is you have done wrong. It looks like you have followed similar steps to how_are_we_stranded_here. Maybe it would be worthwhile giving their package a go and to see how your results compare between the two appraoches?
Hi Jack,
thanks for the suggestion. I had some problems with getting how_are_we_stranded_here to work, so I just gave up on it :) In the meantime, I had it confirmed that the libraries were indeed prepared as stranded, so I am now considering the possibility that they could be sub-optimal regarding the "strandness" quality.