Identifying and classifying the most abundant sequences in small RNAseq data
1
0
Entering edit mode
2.6 years ago
Mauro ▴ 20

Hi, I have the small RNAseq results for a human sample, where I mostly care about seeing what the top sequences are, and try to classify them (tRNA, rRNA, mRNA, lnc, etc).

I'm running into an issue when mapping straight to human genome (HISAT2) where most reads are multi mapped to highly conserved regions, so the gene/region counts (htseq-count) are wildly off when comparing the results to the "top 100 overly represented sequences" I get from the FASTQC report.

I need to keep the original sequence information at hand, so after mapping to a genome or RNA database I need to search to what each sequence in my "top 100" matched to.. is there a way for me to do run these queries against my bam file? Or maybe a better way of doing this?

Thanks!

hisat2 srna bam rna-seq • 526 views
ADD COMMENT
0
Entering edit mode
2.6 years ago

How about running the nf-core smrnaseq pipeline for a starter?

A nf-core pipeline is of course an opinionated way of running an analysis, but usually those pipelines comprise most standard tools for a specific type of analysis. From the intermediate results, you can always dig deeper with a custom approach later, but typically there is no point in reinventing the wheel from the start.

ADD COMMENT

Login before adding your answer.

Traffic: 1923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6