Question

mapping rate of small rna-seq with different references

0

Entering edit mode

6.7 years ago

woongjaej ▴ 30

Hi, folks

I'm new to analysing small RNa-seq and I have some questions. Hope some experts analysing small RNA-seq could give me some advices.

I'm mapping my single-end smRNA-seq data to hg19, hg38 references. I used Cap-mirseq pipeline to do this so the aligner was bowtie. When I got bam files, I check mapped reads with samtools flagstat and was surprised. My bam mapped to hg19 reference got about 50,000,000 reads and bam mapped to hg38 reference got about half of hg19 mapped bam. I had 3 more other data and tried all of them. Here's what I got for the flagstat result.

hg19

58275643 + 0 mapped (90.82% : N/A)
25226589 + 0 mapped (87.94% : N/A)
36270257 + 0 mapped (86.49% : N/A)
27601897 + 0 mapped (91.43% : N/A)

hg38

23224974 + 0 mapped (53.08% : N/A)
18395834 + 0 mapped (74.52% : N/A)
20368027 + 0 mapped (62.12% : N/A)
17979959 + 0 mapped (73.06% : N/A)

Could this be possible??

I'm going to analysis DEG with these data. I'm confused how to get raw count file with smRNA-seq data. This is different with just RNA-seq, right? Could someone give me some pointer how to do this? Should I use just normal gtf file or miRNA data base's gtf file?(such as hairpin.gft?)

(ex. using htseq-count with which gtf file or gff file, the feature type I should use, id attribute to use,etc)

Thank you very much for your helps!

smRNA-seq mapping rate reference • 2.4k views

ADD COMMENT • link 2.4 years ago by woongjaej ▴ 30

score 1 · Answer 1 · 2018-03-07

Firstly I assume you are analysis is primarily miRNA detection.

So when you map smRNA reads, they being very small sequence tends to map everywhere along the genome. Now for the anomaly in mapping %age, it can happen due to a lot of reasons like masking, incorrect chromosomal file concatenation, difference in mapping parameters etc. So I suggest two things first filter the reads of Rfam or transcriptome sequences (use --norc option for transcriptome), then map them to latest human genome. Then you shall get all putative miRNA reads to considered for miRNA detection.

Now for differential miRNA expression, its actually much simpler than RNA-seq. This can be readily achieved through miRDeep2 pipeline, then you just have to convert the raw counts to CPM counts and do fold change (or may be put it through DESeq2). But if you want to do manually 1. collapse the reads (collapsed values are your read counts). 2. map the collapsed read file to human mature miRNA from miRBase (get mappings in bowtie default option and not bam) and 3. from the bowtie_out file append the collapsed read count value to the mapped reference mat-miRNA (in case of multiple reads mapping, add the values). Then use CPM or DESeq2 method for Differential expression.