Entering edit mode
24 days ago
frarodmar17
•
0
I want to analyse the repeated sequences detected through cell free RNA-seq and I do not have clear how to use the aligner programme hisat2. I mean, I do not know if I can use the default genome index of hisat2, or I have to use the genome_tran or genome_rep. I thought that the genome index contained all the genome regions (coding regiones, non-coding regions...), but now I am not sure.
Repeat sequences are not handled well by most aligners using the default genome or transcriptome references. My suggestion is to find or create a reference that has the RNA species that you want to analyse and use an aligner to map reads to it. Just keep in mind that some aligners may struggle when a read maps to multiple hits equally. I'm not sure exactly what you are trying to do, but Rfam (https://rfam.org/) might be a useful RNA sequence database. A mapper like BWA or STAR could be used.
What I am trying to do is to detect repetitive sequences in RNA-seq sequences, so I thought it would be good to align the sequences directly to a reference genome and after that, to use a repetitive sequences annotation file (from RepeatMasker) to quantify the repetitive sequences detection. I have not found any reference genome for repetitive sequences. In fact the only thing I found was the genome_rep from hisat2: http://daehwankimlab.github.io/hisat2/download/
Sure you can give it a try by mapping reads to the genome and then quantify the mapped reads to these locations, but for some of these repeat elements that have thousands of near exact copies in the genome, aligners struggle, and so those reads may end up being discarded. Alternatively The repeatmasker command line tool can screen fastq files directly and can give the abundances of the main classes of repeats. It is pretty slow, so maybe just use <1M reads to represent each sample.
Okay thank you very much Mark. I will try first doing the first option because I am trying to look for repetitive sequences in cell-free RNA, independently from the type of repetitive sequence.
What are "the repeated sequences"? Keep in mind that we don't know your project at all.
I refer to repetitive RNA sequences. My question is that if my objective is to detect repetitive sequences, which genome index should I choose?