Question

bowtie (1) 0% alignment on paired-end RNAseq data

0

Entering edit mode

5.6 years ago

gulamaltab • 0

Hi

I am trying to align paired-end miRNA-seq data using bowtie, I have tried a number of times using different options, however, it seems like I am doing something wrong. I am getting 0.3-1% alignment rate. I have tried bowtie2 which gave me 96% alignment rate.

I have used the defult options first:

bowtie  ~/work/BWA/refrat.fa -p 2 -1 19-8883-R1.fastq -2 19-8883-R2.fastq -S 19-8883.sam 2> 19.log

Also I have used these options: bowtie –q –n 0 –e 80 –l 18 –a –m 5 –best –strata

Also I have used these options: bowtie –q –v 1 –a –m 5 –best –strata.

But i kept getting the similar results as follows:

# reads processed: 32291169
# reads with at least one reported alignment: 79980 (0.25%)
# reads that failed to align: 32211189 (99.75%)
Reported 172518 paired-end alignments to 1 output stream(s)

Any suggestions? Thanks

bowtie RNA-Seq alignment miRNA • 4.6k views

ADD COMMENT • link updated 5.6 years ago by ATpoint 85k • written 5.6 years ago by gulamaltab • 0

0

Entering edit mode

What if you try to align R1 or R2 independently? There might be an issue with insert sizes.

ADD REPLY • link 5.6 years ago by igor 13k

0

Entering edit mode

I will try that, If I do that how do I combine the files in the end? is it possible?

ADD REPLY • link 5.6 years ago by gulamaltab • 0

ATpoint · Answer 1 · 2019-04-16

1

Entering edit mode

5.6 years ago

ATpoint 85k

My guess is that your raw data are not properly (or not at all) trimmed for adapter sequences. In miRNA-seq, your targets are typically small, somewhat < 30bp. Your read length is probably 2x50bp or longer so you will pick up adapter content. Bowtie2 unlike Bowtie by default supports local alignments which means it can soft-clip non-matching (=adapter) content while still align the local part of the read that matches the reference. With Bowtie the read will probably go unaligned due to the many mismatches. Please run fastqc and post the adapter content part (How to add images to a Biostars post) or if you did trimming show the command line.

Also, just to be sure, your index is in a folder called bwa, hope this is co-incidence and you are not trying to use a bwa index with bowtie?

ADD COMMENT • link 5.6 years ago by ATpoint 85k

0

Entering edit mode

Hi ATpoint, I have checked the Fastqc on all my samples and adapters were trimmed. I have uploaded a picture of one of the sample. Yes I have indexed using bowtie-build and BWA is just the folder in that example above.

enter image description here

ADD REPLY • link updated 5.6 years ago by ATpoint 85k • written 5.6 years ago by gulamaltab • 0

1

Entering edit mode

Something does not sound right about this.

Do you know why 150 bp PE sequencing was chosen if you are only looking at miRNA/small RNA libraries?
Have you checked to see what kit was used for creating these libraries? Many miRNA kits have specific instructions that include specific adapter sequence to look for (FastQC will not know about this) and how the data needs to be processed to remove these extraneous sequences before alignment.

I have tried bowtie2 which gave me 96% alignment rate.

Against the same genome? Have you looked in the alignment file to see if a large part of the read(s) is getting soft-clipped? I have a feeling that would be the case.

Since bowtie v.1 can't do gapped alignments, it is unable to use the entire read (since each read likely has extraneous sequence) so you are getting that poor alignment percentage. Proper trimming as suggested by @ATPoint should help alleviate this.

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

Hi, thanks for the reply. I am not sure why 150 bp PE was chosen, Illumina only gave me this option. As for the library, the NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina was used.

Yes 96-97% alignment rate againgst the same genome. I am new to rna-seq, not sure how to check for soft clipping, i will have a look at it.

The raw Fastq files were trimmed for the presence of Illumina adapter sequences using Cutadapt version 1.2.1. The option -O 3 was used, so the 3' end of any reads which match the adapter sequence for 3 bp. or more are trimmed.

The reads were further trimmed using Sickle version 1.200 with a minimum window quality score of 20. Reads shorter than 20 bp. after trimming were removed.

Thanks for your input.

ADD REPLY • link 5.6 years ago by gulamaltab • 0

0

Entering edit mode

What does the size distribution plot look like if you run FastQC on the trimmed FastQ files?

ADD REPLY • link 5.6 years ago by Friederike 9.0k

0

Entering edit mode

this is the graph of size distribution

ADD REPLY • link 5.6 years ago by gulamaltab • 0

0

Entering edit mode

That is interesting.

What is R0/R1/R2? Is R0 the merged representation of R1/R2 reads?

So most of your reads (after trimming?) are around 20-30 bp. Can you clarify each step you have done (starting with raw data) to get this plot? Include command lines for all programs you used in the process (use dummy paths/file names, if you want to).

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

NEBNext® Ultra™ Directional RNA Library Prep Kit

Are you sure this kit is appropriate/right choice for smallRNA libraries? I don't see any mention of small RNA's after a cursory look at the web link.

You may have received normal RNAseq libraries.

Reads shorter than 20 bp. after trimming were removed.

Yow that may be some of the smallRNA data you want :-)

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

hmm, I have used the same data just the one strand R1 in mirdeep2 package, which uses bowtie 1 to align the reads. that gave an alignment rate of 82% overall. Could it be something to do with paired-end alignment?

ADD REPLY • link 5.6 years ago by gulamaltab • 0

0

Entering edit mode

I don't know if mirdeep2 post processes the reads in some way since bowtie v.1. is unable to align them on its own.

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

Hmm, that makes little "biological" sense. Are you sure this is a smallRNA-seq dataset? Is this your data or published (from GEO or so)? With smallRNA you must (should) pick up adapter content at this read length. Maybe the adapter sequence is not known to fastqc but this implies non-standard library prep. How has the library been made?

ADD REPLY • link 5.6 years ago by ATpoint 85k

0

Entering edit mode

Hi ATpoint Yes it is smallRNA-seq data set and yes it is my own data. Please kindly have look at the above reply to @genomax regarding the library.

ADD REPLY • link 5.6 years ago by gulamaltab • 0