Question

Feature Counts: low assigned alignments (1.9%) with -s 0: Unstranded option any assigned alignments with -s 2: reversely strandedoptionts

0

Entering edit mode

5 weeks ago

francesca.catellani • 0

Hi!

I am doing RNA-sequencing analysis. I did STAR alignment (version 2.7.1a) and then Feature Counts (version 1.6.4). I tried the two option, -s 0 (unstranded) and -s 2 (reversely stranded.

The code attached below is the code I used with he -s 0 option. I run it also with option-s 2.

For most of my samples I have got: successfully assigned alignments: 36 % (-s 0) and 35 % (-s 2)

For a few of my samples:

successfully assigned alignments: 1.6 % (-s 0) and no percentage (-s 2)
successfully assigned alignments: 0.6 % (-s 0) and no percentage (-s 2)
successfully assigned alignments: 3.2 % (-s 0) and no percentage (-s 2)

It seems that for these few samples with low assigned alignments there are not assigned alignments when I run feature counts code with the reversely stranded option.

I checked the alignments results in log.final.out:

for the samples with assigned alignment around 35 % Uniquely mapped reads % ~ 70-75 % % of reads mapped to multiple loci ~ 25-30%
for the samples with assigned alignment around 1 % Uniquely mapped reads % ~ 15-20 % % of reads mapped to multiple loci ~ 75 - 80 %

Aligned.sortedByCoord.out.bam

for the samples with assigned alignment around 35 % size around 5 000 000 000 bytes
for the samples with assigned alignment around 1 % size around 40 000 000 000 bytes

Questions:

Is having around 35-36% assigned alignments normal?
Why do few samples have so low assigned alignments in -s 0 option and not at all assigned alignments in s- 2?
Why do the difference in Uniquely mapped reads and % of reads mapped to multiple loci have opposite values (%) in samples with 35% assigned alignments compared to samples with 1% assigned alignments?

I hope I have explained my issue clearly enough to make it understandable. Best regards, Francesca

#!/bin/bash

# MAKE NEW DIRECTORY FOR EACH SAMPLE IN ALIGNMENTS FOLDER
mkdir -p /ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/

# RUN STAR FOR EACH SAMPLE
/ludc/Tools/Software/STAR/2.7.1a/bin/STAR --genomeDir /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14 \
--sjdbGTFfile /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14/gencode.v46.primary_assembly.basic.annotation.gtf \
--readFilesIn \
/ludc/Raw_Data_Archive/Sequencing/Rna_Seq/FoetalforNCD/Raw_Content/fastq/1334/1334_S11_R1_001.fastq.gz \
/ludc/Raw_Data_Archive/Sequencing/Rna_Seq/FoetalforNCD/Raw_Content/fastq/1334/1334_S11_R2_001.fastq.gz \
--outFilterMismatchNmax 20 --outFilterType BySJout --runThreadN 12 --alignSJDBoverhangMin 1 --sjdbOverhang 99 --outFilterMismatchNoverLmax 0.04 --outSAMtype BAM SortedByCoordinate --alignIntronMin 20 --alignIntronMax 1000000 --outReadsUnmapped Fastx --readFilesCommand zcat \
--outSAMstrandField intronMotif \
--outFileNamePrefix /ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/

# RUN FEATURECOUNTS FOR EACH SAMPLE
echo FEATURECOUNTS IS WORKING WITH SAMPLE 1334_S11
/ludc/Tools/Software/Subread/1.6.4/bin/featureCounts -a /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14/gencode.v46.primary_assembly.basic.annotation.gtf \
/ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/Aligned.sortedByCoord.out.bam -p -T 12 -s 0 -C -O \
-o /ludc/Active_Projects/PONA/Private/Update_2024/Counts/featureCounts_Gene/1334_S11.gene_counts.txt

# RUN FEATURECOUNTS FOR EACH SAMPLE ON EXON LEVEL
echo FEATURECOUNTS IS WORKING WITH SAMPLE 1334_S11
/ludc/Tools/Software/Subread/1.6.4/bin/featureCounts -a /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14/gencode.v46.primary_assembly.basic.annotation.gtf \
/ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/Aligned.sortedByCoord.out.bam -p -f -T 12 -s 0 -C -O \
-o /ludc/Active_Projects/PONA/Private/Update_2024/Counts/featureCounts_Exon/1334_S11.exon_counts.txt

echo ANALYSIS FOR SAMPLE 1334_S11 IS FINISHED

alignments RNA-seq • 462 views

ADD COMMENT • link updated 5 weeks ago by Ram 44k • written 5 weeks ago by francesca.catellani • 0

2

Entering edit mode

At least from my experience, given these are (I assume) human samples and the GRCh38 reference genome, I would expect a much higher % of uniquely mapped reads (from the STAR mapping output). I would suggest running something like FastQC on the FASTQ files to check if there might some oddities in the raw reads.

The -s 0/1/2 depend on how your library was prepped. If it was stranded then you need to choose either -s 1 (forward) or -s 2 reverse. Since you also have R1 and R2 i.e. paired-end, it might also be good to include -p and --countreadpairs in the featureCoutns command.

From some other posts (e.g. STAR output - high multi-mapping, low alignment (human)) the high % of multiple-mapped loci could also be due to rRNA sequences, which could also be investigated.

ADD REPLY • link 5 weeks ago by DGTool ▴ 290