Hisat2 index genome
0
0
Entering edit mode
24 days ago

I want to analyse the repeated sequences detected through cell free RNA-seq and I do not have clear how to use the aligner programme hisat2. I mean, I do not know if I can use the default genome index of hisat2, or I have to use the genome_tran or genome_rep. I thought that the genome index contained all the genome regions (coding regiones, non-coding regions...), but now I am not sure.

hisat2 • 401 views
ADD COMMENT
1
Entering edit mode

Repeat sequences are not handled well by most aligners using the default genome or transcriptome references. My suggestion is to find or create a reference that has the RNA species that you want to analyse and use an aligner to map reads to it. Just keep in mind that some aligners may struggle when a read maps to multiple hits equally. I'm not sure exactly what you are trying to do, but Rfam (https://rfam.org/) might be a useful RNA sequence database. A mapper like BWA or STAR could be used.

ADD REPLY
0
Entering edit mode

What I am trying to do is to detect repetitive sequences in RNA-seq sequences, so I thought it would be good to align the sequences directly to a reference genome and after that, to use a repetitive sequences annotation file (from RepeatMasker) to quantify the repetitive sequences detection. I have not found any reference genome for repetitive sequences. In fact the only thing I found was the genome_rep from hisat2: http://daehwankimlab.github.io/hisat2/download/

ADD REPLY
1
Entering edit mode

Sure you can give it a try by mapping reads to the genome and then quantify the mapped reads to these locations, but for some of these repeat elements that have thousands of near exact copies in the genome, aligners struggle, and so those reads may end up being discarded. Alternatively The repeatmasker command line tool can screen fastq files directly and can give the abundances of the main classes of repeats. It is pretty slow, so maybe just use <1M reads to represent each sample.

ADD REPLY
0
Entering edit mode

Okay thank you very much Mark. I will try first doing the first option because I am trying to look for repetitive sequences in cell-free RNA, independently from the type of repetitive sequence.

ADD REPLY
0
Entering edit mode

What are "the repeated sequences"? Keep in mind that we don't know your project at all.

ADD REPLY
0
Entering edit mode

I refer to repetitive RNA sequences. My question is that if my objective is to detect repetitive sequences, which genome index should I choose?

ADD REPLY

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6