Genome type to build transcript reference with RSEM?
1
0
Entering edit mode
9.6 years ago

Hi all,

Currently I am preparing the reference transcriptome used by RSEM in RNA-seq experiments. For this, I use rsem-prepare-reference function with .GTF and .fasta files downloaded from Ensembl (latest release, v.80).

However, I have some questions regarding the masking level of the genome (which can be complete genome, as well as soft- or hard- masked for repetitive sequences). Is there any influence of the masking level when I build the transcript reference? For example, if I use a hard masked genome instead of a complete genome, will that have a huge impact on my final transcript set (considering that I will be using the same GTF coordinates in both scenarios)?

I ask that because I saw that the human transcriptome may have some level of repetitive sequences and I don't know if these sequences are completely lost in the hard-masked genome.

Does anyone have some insight on that matter?

Thanks!

rsem RNA-Seq genome alignment • 2.8k views
ADD COMMENT
0
Entering edit mode

True! I just checked my transcripts.fa file and there are some small sequences (~10-20) full of Ns...

Thank you very very much!

ADD REPLY
2
Entering edit mode
9.6 years ago

I would strongly encourage you to not use the hard-masked genomes for this. You're pretty much guaranteed to have a bunch of excess Ns in the resulting sequence if you were to use the hard-masked version. Either the soft-masked or plain fasta files will work fine (they should produce equivalent results in fact).

ADD COMMENT

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6