Greeting.
I downloaded a dataset from my species of interest (link). I checked the publication but no information was given about the library preparation they used. To infer the standess of my sample I ran salmon on the data. Salmon then inferred the most likely library type to be ISR (inward, stranded reverse)
After trimming my data I aligned the samples using hisat2 and performed quantification using featureCounts using the following commands:
hisat2 --dta --rf -x ../indexes/cro_index -1 fastq_trimmed/$SRA\_1_paired.trimmed.fastq -2 fastq_trimmed/$SRA\_2_paired.trimmed.fastq | samtools view -C -T ../cro_v2_asm.fasta - | samtools sort > cram_files_trimmed/$SRA.cram;
featureCounts -p -a -s2 transcriptome_assembly.mRNA.gtf -t exon -g gene_id -o featureCounts/$SRA.tsv cram_files_trimmed/$SRA.cram
However, only around 30-40~ of the aligned reads were assigned to a feature. Reruning featureCounts without the -s2 option increased the assignment to 60-70% which is more in line with other samples from the same species.
I'm not sure if I'm misinterpreting the -s option in featureCounts and the -s2 option only considers reads mapped in the reverse strand. Should featureCoutns be used with -s 0 in this case? Am I missing something?
Thanks in advance
You should ideally run
salmon
using the whole genome sequence as decoy (How does salmon deal with decoy? ). That is the recommendation fromsalmon
developers. You are trying to compare two different methodologies so it is not surprising that there is no 1:1 correlation in read assignment.featureCounts
ignores multi-mapped reads.salmon
excels at assigning them to transcripts.Comparison at DE result level is what you should look at.
Hi! I think you missed the point my question. I will update the post to make it more clear. I was not trying to compare salmon with featureCounts. I was not even trying to quantify using salmon. The only thing I wanted from salmon was to understand what type of library preparation method was used (i.e stranded or not stranded, fr or rf).
My objective is to quantify my samples using hisat2 + featureCounts
Way your post was written and the title
quantification of stranded RNAseq
led to my comment above.In light of explanation take a look at
infer_experiment.py
from RSeQC and/or another tool GUESSmyLT. Examining the alignments in IGV is also helpful (Answer: Infering strand specificity of bam files using rseqc? )I used the RSeQC script and I obtained the following results:
This is PairEnd Data
From this result, it appears my library is not stranded. I find this weird since aligning with hisat2 with the --rf option had a > 95% alignment ratio
--rf
does not change HISAT2 alignment algorithm, but it changes whether the read will beproperly paired
or not. Read mapped in proper pair is a flag (0x2) in sam/bam alignment format that highlight read pairs that mapped on the same chromosome, in a reasonable distance from each other, and in proper orientation (given by--rf
in this case). The number of properly mapped reads is probably lower than it should be in your alignment file because of this.