I have the rna sequence from nanopore. So I want to alignment my rna sequence with transcriptome reference.
But I don't know how to get transcriptome reference from anywhere such as ncbi or any database. Therefore my question is :
In NCBI, If we have the complete genome if I download in format of coding sequence. We can use this coding sequence as a transcriptome reference?
If not how I get the transcriptome reference?
Thank you.
For question 1, yes, the protein-coding sequences should be fine to use as a transcriptome reference. You just won't be able to map against non-coding RNA.
You tagged this "Virus". Whether you need to download a transcriptome at all depends on your method of alignment or pseudo-alignment. Two things:
there is not much splicing going on if any nor are there many intergenic regions, so the transcriptome will be more or less identical to the genome (of course if you want to use salmon or kalisto you need the transcriptome for pseudo alignment)
In principle, you should simply use the genome sequence to align against using e.g. BWA-mem which is quite straightforward. Using a splicing-aware aligner will work but isn't required for viral sequences (just check the genome annotation if in doubt).
Thank you everyone. I get more understand it and I will follow with your guidance.