Hey everyone,
I have performed a RNAseq experiment with two closely related species, for one of which I have an assembled transcriptome from previous sequencing. I tried various options for mapping both species to the existing transcriptome, once with CLC Genomics Workbench with looser parameters for cross-species mapping so that the mapping percentages were roughly equal (around 80%), and once in protein space with blat and default parameter for both species. I started exploring the datasets with the DESeq2 package in R and noticed that the PCA plots differ massively when performed for both species in regard to the mapping procedure. However, there is very little difference when looking at the species separately. When I plot the normalized values of random contigs I can also detect very few differences between the two procedures, so what exactly is happening here? And can I draw a conclusion from the PCA analysis in regard to which mapping procedure is better or is that not possible at all?
NB: the sequencing output was 50bp single-end reads, which I fear is not appropriate for blat mapping. All of the papers that use this method had at least 100 bp reads.
Thank you!
I don't see any issues, but I've not done any benchmarking to be absolutely sure.