Entering edit mode
2.9 years ago
Takuma
▴
20
I got DNA and RNA reads of a beetle. I conducted de novo genome assembly. N50 of genome scaffold is about 16,000bp and BUSCO score is 96%. Is it okay to map RNA reads to the genome ?
Anyway I tried mapping RNA reads to the genome, using HISAT2.
The alignment rate is 80%.
Is it normal ?
The other method I'm thinking is mapping RNA reads to de novo assembled transcript. N50 of the transcript is 1800bp and BUSCO score is 89%.
I am wondering if I should map RNA reads to the genome or the transcript. Which way is better?
thanks
on a side note : N50 for a transcriptome assembly is a pointless measure ;-)
This kind of depends on what it is you want to do with the mapped reads, I guess?
Indeed depends. If you want transcript quantification for the sake of differential expression you can do the best of both worlds. Use the selective alignment implemented in
salmon
. It can take the transcriptome as reference and the genome as decoy. It will then map reads to the transcriptome while also checking whether a better alignment for each read is possible in the genome, and if yes then do not use the read for transcript quantification. This is a good approach to compensate for gDNA contaminations and removal of spurious alignments to the transcriptome.Thanks helpful advise! If some reads are better mapping to genome compared to transcript, does salmon pick up these reads as output fastq file?
No, I do not think so but afair it has an option to output a BAM file so you could parse the unmapped reads from there. Never done that, please double check with the manual.
My interest genes are divided into more than 2 scaffolds in genome. Does it matter whether the interest gene is divided into multiple scaffolds due to short N50, when mapping?
yes. I think that none of the tools will be able to handle this. (but not sure about it though)