Question

Align miRNA library (small RNA-seq) without trimming

0

Entering edit mode

20 months ago

PBC ▴ 10

Hi Everyone!

I would like to align my miRNA library (small RNA) with the genome. I read that trimming is optional prior the alignment depending of the downstream analysis. I would like to do DEG only, and I am considering not trimmed my reads.

I performed the alingment with my untrimmed read by using bowtie1. Around 4-6% os my reads aligned with the genome.

code

bowtie -n 0 -l 30  -a --best --strata  --threads 16 $reference/danRer11  -q  reads_file.fastq -S output.sam

I also performed with bowtie2, and I had more aligned reads(33%-60%) even It is still not good alingment result.

code

bowtie2  -p 2 -q --local -x $reference/danRer11 -U reads_file.fastq -S output.sam

Does someone have a suggestion to improve the alingment?

It seems that bowtie2 is better with --local mode since the soft clipping that it does. But I also consider that the bad bowtie1 result could be due my parameter setting.

Thank you in adavance for your answer! :)

BOWTIE2 RNA-SEQ BOWTIE • 2.7k views

ADD COMMENT • link 20 months ago by PBC ▴ 10

0

Entering edit mode

bowtie 1 is good for short read < 50bp, how long is your read? bowtie2 is better for reads > 50 bp

ADD REPLY • link 20 months ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

Hi Ming

Untrimmed reads have 76bp in average.

ADD REPLY • link 20 months ago by PBC ▴ 10

score 3 · Answer 1 · 2023-03-30

3

Entering edit mode

20 months ago

ATpoint 85k

For miRNA you should trim the data to the actual expected miRNA length (20-something bp, I would need to look up the exact value). Reason is that in a 76bp read and a say 25bp miRNA you would have more than two thirds of sequence content being adapter/technical rather than "true/biological" and this will almost certainly cause alignment problems. bowtie1 is an ungapped aligner so it does not allow indels which is generally good for very short reads in which indel permission would increase the number of mapping locations of such a short read. bowtie2 is the more recent aligner that (in my feeling) is much more widely used for modern short-read NGS. Both should be fine here for trimmed reads, despite for miRNA I would do end-to-end alignment in bowtie2 due to the short length. I am not much a short RNA person, but I would try to use existing pipelines for this, maybe something from nf-core, or at least scan the method section of miRNA papers, see how they do it. What would also be the question is whether to align to the genome or rather a miRNA reference...short RNA people please comment on that. Genome would probably cause much multimapping but captures gDNA contaminants while miRNA reference might reduce multimapping but could falsely align contaminants.

ADD COMMENT • link 20 months ago by ATpoint 85k

2

Entering edit mode

I've just done a lot of small RNA-seq analysis myself and agree with everything here. To OP: I found miRge3.0 to be fast and straightforward. Customization is hard but if you're looking at a model species, then it's a super powerful first step.

My understand is it uses cutadapt -> several bowtie1 filtering steps -> then optional novel miRNA discovery, a2i editing, tRNA fragments, etc. Only thing it doesn't include is piRNAs which was surprising considering how thorough it is.

ADD REPLY • link 20 months ago by Trivas ★ 1.8k

0

Entering edit mode

Hi Trivas

Thank you for your answering.

I did the cutadapt .But when I checked the quality of my trimmed reads, there is a decreasing of my base quality at the end of my reads.

Also, a huge amout of my reads have 18-35bp lenght after trimming, even there is still reads with 76 length. Based on my lenght distribution, it seems the majority of my reads have this lenght after trimming as also it is expected for miRNAs.

Thus, may I ask you if you restricted the lenght of your trimmed reads when using cutadapt?

Thank you!

ADD REPLY • link 20 months ago by PBC ▴ 10

1

Entering edit mode

I also had a couple reads that were the max cycles (in my case, 72 bp) but the large majority were 22nt. You should make a read length distribution histogram to confirm if it's something you should worry about. With Cutadapt, I set a min size of 18 bp but I don't think there's a way to set a max size.

ADD REPLY • link 20 months ago by Trivas ★ 1.8k

0

Entering edit mode

Hi

Thank you for lettig me know. I was seeing that others also sets the minimum lenght 18bp for miRNA analysis. And I agree that I should see my length distribution to do this restrict lenght during trimming.

As for the setting for maximum lenght, it is possible by using the -M parameter

-M LEN[:LEN2], --maximum-length LEN[:LEN2] Discard reads longer than LEN. Default: no limit

ADD REPLY • link 20 months ago by PBC ▴ 10

0

Entering edit mode

Thank you ATpoint for answering

For my last attempt with bowtie2, I got it from tutorial for small RNA library.

As for align with the genome and miRNA reference, I did not trying it. And I totally agree with your point. But for this I need to improve somehow my reads quality of my trimming. As I said to Trivas, I am a little struggle of it.

I will give a try on nf-core pipelines. They have good stuff there! :)

Thank you!

ADD REPLY • link 20 months ago by PBC ▴ 10

score 0 · Answer 2 · 2023-03-30

0

Entering edit mode

20 months ago

Buffo ★ 2.4k

I wouldn't say that trimming is optional prior the alignment depending of the downstream analysis. Your reads might contain sequencing adapters, and if that is the case, you must trim them before mapping to the reference. Maybe it is the reason for your low mapping rate? How does the fastqc looks like?

Besides:

I would like to do DEG only, and I am considering not trimmed my reads.

I'm not sure if analyzing DEGs is possible if you discard potential precursor sequences in your reads (I mean, it is possible, but how biased the results might be?).

ADD COMMENT • link 20 months ago by Buffo ★ 2.4k

0

Entering edit mode

Hi

Thank you for your point! But I was reading some articles and other discussions post that make me think about when trimming is really necessary.

Here is one of the article : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671312/.

It seems that untrimmed are beneficial regarding alingment and computational time consuming.

As for the DEG, sorry but I did not get your point. Can you explain it further? My main question is evaluating if there is differential miRs expression in different samples.

ADD REPLY • link 20 months ago by PBC ▴ 10

0

Entering edit mode

It seems that untrimmed are beneficial regarding alingment and computational time consuming.

There is no benefit if your reads have adapter contamination. I wouldn't conclude whether or not trimming is needed before taking a look at the fastqc. Once again, what does it look like?

My main question is evaluating if there is differential miRs expression in different samples.

DEG and differential miRNA expression are not the same. That's my point.

ADD REPLY • link 20 months ago by Buffo ★ 2.4k