Question

Hisat2 index genome

0

Entering edit mode

6 months ago

frarodmar17 • 0

I want to analyse the repeated sequences detected through cell free RNA-seq and I do not have clear how to use the aligner programme hisat2. I mean, I do not know if I can use the default genome index of hisat2, or I have to use the genome_tran or genome_rep. I thought that the genome index contained all the genome regions (coding regiones, non-coding regions...), but now I am not sure.

hisat2 • 815 views

ADD COMMENT • link 6 months ago by frarodmar17 • 0

1

Entering edit mode

Repeat sequences are not handled well by most aligners using the default genome or transcriptome references. My suggestion is to find or create a reference that has the RNA species that you want to analyse and use an aligner to map reads to it. Just keep in mind that some aligners may struggle when a read maps to multiple hits equally. I'm not sure exactly what you are trying to do, but Rfam (https://rfam.org/) might be a useful RNA sequence database. A mapper like BWA or STAR could be used.

ADD REPLY • link 6 months ago by mark.ziemann ★ 2.0k

0

Entering edit mode

What I am trying to do is to detect repetitive sequences in RNA-seq sequences, so I thought it would be good to align the sequences directly to a reference genome and after that, to use a repetitive sequences annotation file (from RepeatMasker) to quantify the repetitive sequences detection. I have not found any reference genome for repetitive sequences. In fact the only thing I found was the genome_rep from hisat2: http://daehwankimlab.github.io/hisat2/download/

ADD REPLY • link 6 months ago by frarodmar17 • 0

1

Entering edit mode

Sure you can give it a try by mapping reads to the genome and then quantify the mapped reads to these locations, but for some of these repeat elements that have thousands of near exact copies in the genome, aligners struggle, and so those reads may end up being discarded. Alternatively The repeatmasker command line tool can screen fastq files directly and can give the abundances of the main classes of repeats. It is pretty slow, so maybe just use <1M reads to represent each sample.

ADD REPLY • link 6 months ago by mark.ziemann ★ 2.0k

0

Entering edit mode

Okay thank you very much Mark. I will try first doing the first option because I am trying to look for repetitive sequences in cell-free RNA, independently from the type of repetitive sequence.

ADD REPLY • link 6 months ago by frarodmar17 • 0

0

Entering edit mode

What are "the repeated sequences"? Keep in mind that we don't know your project at all.

ADD REPLY • link 6 months ago by ATpoint 89k

0

Entering edit mode

I refer to repetitive RNA sequences. My question is that if my objective is to detect repetitive sequences, which genome index should I choose?

ADD REPLY • link 6 months ago by frarodmar17 • 0