Entering edit mode
3.3 years ago
Lila M
★
1.3k
Hi all,
I'm having terrible problems to map/align single end RNA files from human genome (GRCh.38
). It is genomic DNA but was prepared by using a RNA library kit to preserve strand specificity.
I've first tried STAR and Kallisto and the coverage was very very low. Then, I've tried bowtie2 as the experiment is more like ChIP seq and the coverage is still very poor (see output for samtools flagstat
)
41115150 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
102795 + 0 mapped (0.25% : N/A)
Any advice about how to deal with this data? Thank you
Have you confirmed that the data you have is actually from human genome by taking a few reads and blasting them at NCBI? It would not be the first time that someone received a dataset that was not actually what they thought it was.
there are no similarities with human genome when I run blastn... so I guess it means the sequences in the samples are full artifacts, isn't it?
please note if you run it against RNA database + try to trim them manually (take only the middle of the read) and run blastn
Have you tried blasting not only to humans but to a general database, to see against what species it might match?
yes, and nothing comes out
So it's more evidence that the library prep failed and you sequenced artifacts.
I also think so now.
I did not know that it is possible (the worst Ive seen was around 20% mapping rate) but it is possible.
Can you elaborate the theory behing that? genomicDNA is double-stranded and as such does not really have the concept of strand specificity as cDNA that was synthesized from mRNA has. I guess this unconventional library prep is the reason this mapping is so poor, aka it did not work and you sequenced library prep artifacts.
I did not prepare the samples and for now this is all the information that I have. It is a novel experiment so the person that did it doesn't really know it will work o not. And I agree about it could be a bit messy, but as this is what I have, I would like to know if someone may have a clue about how to deal with this kind of data/experiments.
To be frank, 0.25% alignment is not "a bit messy", it is more an indicator of a failed experiment. I think this is not a bioinformatics problem, seems your team is trying to develop a new experimental technique, but this here is the wrong community for it. The wetlab scientist should try and discuss this (in case you want to do it online) in a Reddit group for molecular biology and NGS. I guess this is where they get most audience these days. If you want to know what these reads that you have are then maybe try blasting a good subset of them to the NCBI nucleotide collection. Still, in case of failed library preps it is not unusual to simply have cryptic reads that are some odd ligation or PCR artifacts with no matches at all.
Thank you very much for your advice!
Looking in the raw data file and blasting sequences should help to see if the reads are library prep artifacts or good reads
maybe you forgot to trim adapters?
I don't know what happened but IMO even if you generate random reads some of them will be mapped to such a huge genome as human's.
Good point. Run
fastqc
in case you haven't, and maybebowtie2
with--very-fast-local
which will allows lenient local alignments with soft-clipping non-matched parts, and then see whether it looks strikingly different.yes, I've trimmed them with bbduck