How to get transcriptome reference?
2
0
Entering edit mode
23 months ago

I have the rna sequence from nanopore. So I want to alignment my rna sequence with transcriptome reference.

But I don't know how to get transcriptome reference from anywhere such as ncbi or any database. Therefore my question is :

  1. In NCBI, If we have the complete genome if I download in format of coding sequence. We can use this coding sequence as a transcriptome reference?

  2. If not how I get the transcriptome reference?

Thank you.

reference mRNA RNA Virus modification • 1.7k views
ADD COMMENT
1
Entering edit mode

For question 1, yes, the protein-coding sequences should be fine to use as a transcriptome reference. You just won't be able to map against non-coding RNA.

ADD REPLY
1
Entering edit mode

You tagged this "Virus". Whether you need to download a transcriptome at all depends on your method of alignment or pseudo-alignment. Two things:

  • viral genomes are small, so you won't have a lot challenges with indexing or aligning to them
  • there is not much splicing going on if any nor are there many intergenic regions, so the transcriptome will be more or less identical to the genome (of course if you want to use salmon or kalisto you need the transcriptome for pseudo alignment)

    In principle, you should simply use the genome sequence to align against using e.g. BWA-mem which is quite straightforward. Using a splicing-aware aligner will work but isn't required for viral sequences (just check the genome annotation if in doubt).

ADD REPLY
0
Entering edit mode

Thank you everyone. I get more understand it and I will follow with your guidance.

ADD REPLY
1
Entering edit mode
23 months ago
barslmn ★ 2.3k

There is a manual about it here: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

ADD COMMENT
1
Entering edit mode
23 months ago
dsull ★ 6.9k

There are plenty of places where you can get transcriptome references. E.g. https://www.gencodegenes.org/human/ -- has FASTA files designated "Transcript sequences".

Otherwise, you can use a tool to extract cDNA regions from a genome FASTA based on annotations from GTF. For example, in the kb-python package (for kallisto and bustools), you can supply the the kb ref command with a FASTA and GTF, and it will output a transcriptome FASTA. There are other tools out there with similar functionality.

ADD COMMENT

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6