Question

Unexplained reads from a genome

0

Entering edit mode

8.4 years ago

bioinfo_ga ▴ 70

Dear friends,

I have six sample ( ~ >36M reads/sample) for ribosomal-profiling. In all the sample, >95% of the reads has adapter sequences as expected. Tophat(default parameters) was used to align the reads against human genome(hg38). For one of the sample had 81.50% alignment and rest had an alignment ranging from 0.60% to 9.00%.

Unaligned reads to the human genome were searched against the NCBI-NT database using BLAST. we could map only maximum of 10% of the reads. We had checked for the barcode and adapter contamination. I have the following queries

1. What could be the reason for the lower alignment in the other samples?

2. How to check where these unaligned reads belong to?

Any suggestions would be appreciated.

RNA-Seq TOPHAT next-gen genome alignment • 1.6k views

ADD COMMENT • link updated 8.3 years ago by Biostar 20 • written 8.4 years ago by bioinfo_ga ▴ 70

0

Entering edit mode

I suggest running FastQC on your samples and see if there are differences in read quality, overepresented sequences, etc,... between the "good" sample and the others.

ADD REPLY • link 8.4 years ago by Carlo Yague 8.9k

0

Entering edit mode

Can you give more information?

0) Did this data come from RNA or DNA?
1) What platform made the reads?  E.g. "MiSeq, 2x250bp".
2) How was the library prepared?  e.g. target insert size, amplification, etc.
3) How were the reads preprocessed prior to mapping?
4) Why are you using TopHat for mapping?  Presumably these are unspliced.
5) How did you check for adapter contamination, and what was the result?
6) Were the samples in different runs or multiplexed together?
7) What do you mean by "In all the sample, >95% of the reads has adapter sequences as expected."?  What were you expecting?
8) is the quality different between "successful" and "unsuccessful" samples?

ADD REPLY • link 8.4 years ago by Brian Bushnell 20k