Question

Genome type to build transcript reference with RSEM?

0

Entering edit mode

10.2 years ago

rna-seq_researcher ▴ 60

Hi all,

Currently I am preparing the reference transcriptome used by RSEM in RNA-seq experiments. For this, I use rsem-prepare-reference function with .GTF and .fasta files downloaded from Ensembl (latest release, v.80).

However, I have some questions regarding the masking level of the genome (which can be complete genome, as well as soft- or hard- masked for repetitive sequences). Is there any influence of the masking level when I build the transcript reference? For example, if I use a hard masked genome instead of a complete genome, will that have a huge impact on my final transcript set (considering that I will be using the same GTF coordinates in both scenarios)?

I ask that because I saw that the human transcriptome may have some level of repetitive sequences and I don't know if these sequences are completely lost in the hard-masked genome.

Does anyone have some insight on that matter?

Thanks!

rsem RNA-Seq genome alignment • 3.0k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.2 years ago by rna-seq_researcher ▴ 60

0

Entering edit mode

True! I just checked my transcripts.fa file and there are some small sequences (~10-20) full of Ns...

Thank you very very much!

ADD REPLY • link 10.2 years ago by rna-seq_researcher ▴ 60

score 2 · Accepted Answer · 2015-06-02

2

Entering edit mode

10.2 years ago

Devon Ryan 105k

I would strongly encourage you to not use the hard-masked genomes for this. You're pretty much guaranteed to have a bunch of excess Ns in the resulting sequence if you were to use the hard-masked version. Either the soft-masked or plain fasta files will work fine (they should produce equivalent results in fact).

ADD COMMENT • link 10.2 years ago by Devon Ryan 105k