Feature Counts: low assigned alignments (1.9%) with -s 0: Unstranded option any assigned alignments with -s 2: reversely strandedoptionts
0
0
Entering edit mode
5 weeks ago

Hi!

I am doing RNA-sequencing analysis. I did STAR alignment (version 2.7.1a) and then Feature Counts (version 1.6.4). I tried the two option, -s 0 (unstranded) and -s 2 (reversely stranded.

The code attached below is the code I used with he -s 0 option. I run it also with option-s 2.

For most of my samples I have got: successfully assigned alignments: 36 % (-s 0) and 35 % (-s 2)

For a few of my samples:

  • successfully assigned alignments: 1.6 % (-s 0) and no percentage (-s 2)
  • successfully assigned alignments: 0.6 % (-s 0) and no percentage (-s 2)
  • successfully assigned alignments: 3.2 % (-s 0) and no percentage (-s 2)

It seems that for these few samples with low assigned alignments there are not assigned alignments when I run feature counts code with the reversely stranded option.

I checked the alignments results in log.final.out:

  • for the samples with assigned alignment around 35 % Uniquely mapped reads % ~ 70-75 % % of reads mapped to multiple loci ~ 25-30%

  • for the samples with assigned alignment around 1 % Uniquely mapped reads % ~ 15-20 % % of reads mapped to multiple loci ~ 75 - 80 %

Aligned.sortedByCoord.out.bam

  • for the samples with assigned alignment around 35 % size around 5 000 000 000 bytes

  • for the samples with assigned alignment around 1 % size around 40 000 000 000 bytes

Questions:

  1. Is having around 35-36% assigned alignments normal?
  2. Why do few samples have so low assigned alignments in -s 0 option and not at all assigned alignments in s- 2?
  3. Why do the difference in Uniquely mapped reads and % of reads mapped to multiple loci have opposite values (%) in samples with 35% assigned alignments compared to samples with 1% assigned alignments?

I hope I have explained my issue clearly enough to make it understandable. Best regards, Francesca

#!/bin/bash

# MAKE NEW DIRECTORY FOR EACH SAMPLE IN ALIGNMENTS FOLDER
mkdir -p /ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/

# RUN STAR FOR EACH SAMPLE
/ludc/Tools/Software/STAR/2.7.1a/bin/STAR --genomeDir /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14 \
--sjdbGTFfile /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14/gencode.v46.primary_assembly.basic.annotation.gtf \
--readFilesIn \
/ludc/Raw_Data_Archive/Sequencing/Rna_Seq/FoetalforNCD/Raw_Content/fastq/1334/1334_S11_R1_001.fastq.gz \
/ludc/Raw_Data_Archive/Sequencing/Rna_Seq/FoetalforNCD/Raw_Content/fastq/1334/1334_S11_R2_001.fastq.gz \
--outFilterMismatchNmax 20 --outFilterType BySJout --runThreadN 12 --alignSJDBoverhangMin 1 --sjdbOverhang 99 --outFilterMismatchNoverLmax 0.04 --outSAMtype BAM SortedByCoordinate --alignIntronMin 20 --alignIntronMax 1000000 --outReadsUnmapped Fastx --readFilesCommand zcat \
--outSAMstrandField intronMotif \
--outFileNamePrefix /ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/

# RUN FEATURECOUNTS FOR EACH SAMPLE
echo FEATURECOUNTS IS WORKING WITH SAMPLE 1334_S11
/ludc/Tools/Software/Subread/1.6.4/bin/featureCounts -a /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14/gencode.v46.primary_assembly.basic.annotation.gtf \
/ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/Aligned.sortedByCoord.out.bam -p -T 12 -s 0 -C -O \
-o /ludc/Active_Projects/PONA/Private/Update_2024/Counts/featureCounts_Gene/1334_S11.gene_counts.txt

# RUN FEATURECOUNTS FOR EACH SAMPLE ON EXON LEVEL
echo FEATURECOUNTS IS WORKING WITH SAMPLE 1334_S11
/ludc/Tools/Software/Subread/1.6.4/bin/featureCounts -a /ludc/Active_Projects/PONA/Private/Update_2024/Reference_genome/Release_46_GRCh38.p14/gencode.v46.primary_assembly.basic.annotation.gtf \
/ludc/Active_Projects/PONA/Private/Update_2024/Alignments/1334_S11/Aligned.sortedByCoord.out.bam -p -f -T 12 -s 0 -C -O \
-o /ludc/Active_Projects/PONA/Private/Update_2024/Counts/featureCounts_Exon/1334_S11.exon_counts.txt

echo ANALYSIS FOR SAMPLE 1334_S11 IS FINISHED
alignments RNA-seq • 462 views
ADD COMMENT
2
Entering edit mode

At least from my experience, given these are (I assume) human samples and the GRCh38 reference genome, I would expect a much higher % of uniquely mapped reads (from the STAR mapping output). I would suggest running something like FastQC on the FASTQ files to check if there might some oddities in the raw reads.

The -s 0/1/2 depend on how your library was prepped. If it was stranded then you need to choose either -s 1 (forward) or -s 2 reverse. Since you also have R1 and R2 i.e. paired-end, it might also be good to include -p and --countreadpairs in the featureCoutns command.

From some other posts (e.g. STAR output - high multi-mapping, low alignment (human)) the high % of multiple-mapped loci could also be due to rRNA sequences, which could also be investigated.

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6