Question

very low coverage when mappin genomic DNA

0

Entering edit mode

3.3 years ago

Lila M ★ 1.3k

Hi all, I'm having terrible problems to map/align single end RNA files from human genome (GRCh.38). It is genomic DNA but was prepared by using a RNA library kit to preserve strand specificity. I've first tried STAR and Kallisto and the coverage was very very low. Then, I've tried bowtie2 as the experiment is more like ChIP seq and the coverage is still very poor (see output for samtools flagstat)

41115150 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
102795 + 0 mapped (0.25% : N/A)

Any advice about how to deal with this data? Thank you

mapping coverage DNA genomic • 2.6k views

ADD COMMENT • link updated 3.3 years ago by ATpoint 85k • written 3.3 years ago by Lila M ★ 1.3k

2

Entering edit mode

Have you confirmed that the data you have is actually from human genome by taking a few reads and blasting them at NCBI? It would not be the first time that someone received a dataset that was not actually what they thought it was.

ADD REPLY • link 3.3 years ago by GenoMax 147k

0

Entering edit mode

there are no similarities with human genome when I run blastn... so I guess it means the sequences in the samples are full artifacts, isn't it?

ADD REPLY • link 3.3 years ago by Lila M ★ 1.3k

0

Entering edit mode

please note if you run it against RNA database + try to trim them manually (take only the middle of the read) and run blastn

ADD REPLY • link 3.3 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Have you tried blasting not only to humans but to a general database, to see against what species it might match?

ADD REPLY • link 3.3 years ago by GenomeXP • 0

0

Entering edit mode

yes, and nothing comes out

ADD REPLY • link 3.3 years ago by Lila M ★ 1.3k

2

Entering edit mode

So it's more evidence that the library prep failed and you sequenced artifacts.

ADD REPLY • link 3.3 years ago by ATpoint 85k

0

Entering edit mode

I also think so now.

I did not know that it is possible (the worst Ive seen was around 20% mapping rate) but it is possible.

ADD REPLY • link 3.3 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

It is genomic DNA but was prepared by using a RNA library kit to preserve strand specificity.

Can you elaborate the theory behing that? genomicDNA is double-stranded and as such does not really have the concept of strand specificity as cDNA that was synthesized from mRNA has. I guess this unconventional library prep is the reason this mapping is so poor, aka it did not work and you sequenced library prep artifacts.

ADD REPLY • link 3.3 years ago by ATpoint 85k

0

Entering edit mode

I did not prepare the samples and for now this is all the information that I have. It is a novel experiment so the person that did it doesn't really know it will work o not. And I agree about it could be a bit messy, but as this is what I have, I would like to know if someone may have a clue about how to deal with this kind of data/experiments.

ADD REPLY • link 3.3 years ago by Lila M ★ 1.3k

1

Entering edit mode

To be frank, 0.25% alignment is not "a bit messy", it is more an indicator of a failed experiment. I think this is not a bioinformatics problem, seems your team is trying to develop a new experimental technique, but this here is the wrong community for it. The wetlab scientist should try and discuss this (in case you want to do it online) in a Reddit group for molecular biology and NGS. I guess this is where they get most audience these days. If you want to know what these reads that you have are then maybe try blasting a good subset of them to the NCBI nucleotide collection. Still, in case of failed library preps it is not unusual to simply have cryptic reads that are some odd ligation or PCR artifacts with no matches at all.

ADD REPLY • link 3.3 years ago by ATpoint 85k

0

Entering edit mode

Thank you very much for your advice!

ADD REPLY • link 3.3 years ago by Lila M ★ 1.3k

0

Entering edit mode

Looking in the raw data file and blasting sequences should help to see if the reads are library prep artifacts or good reads

ADD REPLY • link 3.3 years ago by GenomeXP • 0

0

Entering edit mode

maybe you forgot to trim adapters?

I don't know what happened but IMO even if you generate random reads some of them will be mapped to such a huge genome as human's.

ADD REPLY • link 3.3 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Good point. Run fastqc in case you haven't, and maybe bowtie2 with --very-fast-local which will allows lenient local alignments with soft-clipping non-matched parts, and then see whether it looks strikingly different.