Entering edit mode
3.9 years ago
kuntalasb
▴
10
Hello, I am trying to obtain read counts for differential expression of lncRNAs by using featureCounts but everytime I am getting low alignment rates (0.2%, 1.6% etc). Since I am new to this field, I have no idea whether I should worry about the alignment rates or proceed with it. I am using sam files as input alignment files. Is there any way to increase the alignment rate or am I committing any mistake? My library is paired-end, not strand-specific and the code that I have used is this:
fc=featureCounts(files,annot.ext="annot.gtf",isGTFAnnotationFile=TRUE,isPairedEnd=TRUE,GTF.attrType="transcript_id")
The output summary:
|| Process SAM file EU1sorted.sam... ||
|| Paired-end reads are included. ||
|| Total alignments : 26777689 ||
|| Successfully assigned alignments : 547535 (2.0%) ||
|| Running time : 29.05 minutes ||
|| Process SAM file EU2sorted.sam... ||
|| Paired-end reads are included. ||
|| Total alignments : 23341158 ||
|| Successfully assigned alignments : 428156 (1.8%) ||
|| Running time : 24.80 minutes ||
|| ||
|| Process SAM file EU3sorted.sam... ||
|| Paired-end reads are included. ||
|| Total alignments : 30994812 ||
|| Successfully assigned alignments : 535538 (1.7%) ||
|| Running time : 32.14 minutes ||
Eagerly waiting for assistance. Thank you!
Are your input data alignments with mRNA-Seq (poly-A selection) against a genome or transcriptome, and is
annot.gtf
just lncRNAs? Also, how wasannot.gtf
obtained? Edit: do you only have two samples (one biological replicate per treatment)?My input sam files are generated by aligning the RNA-seq data files with reference genome and the library is not poly-A selected. And yes,
annot.gtf
is just lncRNAs which is obtained from the finalmerged.gtf
(that contains all coding/noncoding transcripts).annot.gtf
is obtained after several filtering processes for lncRNAs. I have three replicates for each sampleIf this is then total RNA-Seq, then the rest (~98%) is perhaps ribosomal RNAs and coding RNAs.
Ok, then probably I can carry out the downstream processes without much worrying about the alignment rate? Thank you so much!
I would not say "without much worrying" as with any experiment, there can be problems in the wet lab that can have some strange influences on the data. I was merely offering a suggestion. The number of counts per sample seems similar based on the data that you have shown. You can search the forum for more suggestions for RNA QC, but perhaps you might check features
RSeQC
from http://rseqc.sourceforge.net/, especially for 5' to 3' coverage bias (edit so-called http://rseqc.sourceforge.net/#genebody-coverage-py)Many thanks for your valuable suggestion!
You are welcome, no problem.