Question

very poor alignment on total RNA data from exosome

0

Entering edit mode

10.4 years ago

nextgenseq.bioinfo • 0

Hello Gentle People,

I have human RNA-seq data obtained from sequencing a total RNA library prepared using Nugen ovation kit 2 from exosome sample. I did quality trimming, checked the usual quality metrics by running FastQC. Everything except duplication (which was 90%) looked fine. When I aligned the 13M SE reads to the hg19 genome using bowtie2, I got 3% overall alignment rate!

The top 20 duplicated reads make up a quarter million of the total reads. I did a BLAST search against nr nucleotide db and every single one of the 20 reads matched with 100% identity to elongation factor 1A mRNA CDS. These reads had no genome alignment though! These reads also matched to Mycoplasma 28S ribosomal RNA with 100% identity.

It's not clear to me why the genome alignment rate is so low. I would appreciate your insightful comments on what might be happening here.

Thanks.

next-gen alignment genome RNA-Seq • 4.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by nextgenseq.bioinfo • 0

0

Entering edit mode

How long are your reads? With longer reads, you'll want to use an RNA-seq alignment tool such as STAR or tophat.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by Sean Davis 27k

0

Entering edit mode

The reads are 50bp in length. I have had no issues with running bowtie on other 50bp SE datasets although these were not from exosome.

ADD REPLY • link 10.4 years ago by nextgenseq.bioinfo • 0

0

Entering edit mode

I don't think the short reads necessarily mean you shouldn't be using TopHat2 - TopHat2 will work with 50bp reads and you're throwing alignments away most likely by not using it. I think you want to establish whether there are any other contaminants in the system. We preface every RNA-Seq analysis with a run through: http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/ to make sure that there are going to be no unpleasant surprises on alignment.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by User 59 13k

0

Entering edit mode

Thanks. I did check the library against rRNA and tRNA (there were insignificant matches) but not against other potential contaminants like E.coli, yeast, mouse etc. Will set up a fastq_screen run on my library and see what I get.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by nextgenseq.bioinfo • 0

0

Entering edit mode

Looks like a poorly prepared library to me. I've seen a paired end RNA pull down library with similar characteristics like what you described (high amount of duplicates, poor alignment statistics as well as sequences that cannot be found when you BLAST it). I've never quite figured out why this happened (as I didn't make the library myself) but my guess is that the person who prepared the library did not use sufficient RNA for the library prep and/or sequenced much deeper than the complexity of the library.

ADD REPLY • link 10.4 years ago by kt8 • 0

0

Entering edit mode

Probably this library was sequenced together with other libraries. Check the index sequence to make sure these reads are really from your sample (assuming that Nugen ovation kit 2 also uses index seq to distinguish different samples as illumina kits). I once encountered such situation that the seq center mislabeled the samples, then I corrected the mistake based on index seq.

ADD REPLY • link 10.4 years ago by Dejian ★ 1.3k