Question

Featurecount mapping is less

0

Entering edit mode

3.4 years ago

vinishavvenugopal ▴ 30

Hi,

I used STAR for alignment and featurecounts for quantification. The featurecount successfully assigned fragment seems to be less (around 25%). It will be really helpful if I find the reason why the matching percentage is low. It's a paired end sequencing - 83 x 83bp reads

STAR - generating indices

STAR --runThreadN 48 \
--genomeDir /scratch/vvenugo9/First_trial/indices \
--runMode genomeGenerate \
--genomeFastaFiles /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa \
--sjdbGTFfile /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf \
--sjdbOverhang 82

STAR alignment

STAR --readFilesCommand zcat \
--genomeDir /scratch/vvenugo9/First_trial/indices \
--readFilesIn /scratch/vvenugo9/First_trial/Choroid_C9_ALS1_TAGCTT_R1_001_out.fastq.gz /scratch/vvenugo9/First_trial/Choroid_C9_ALS1_TAGCTT_R2_001_out.fastq.gz \
--runThreadN 48 \
--sjdbGTFfile /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf \
--runMode alignReads \
--outSAMtype BAM Unsorted \
--outSAMmode Full \
--outSAMstrandField intronMotif \
--outFilterType BySJout \
--outSAMunmapped Within \
--outSAMmapqUnique 255 \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.1 \
--alignMatesGapMax 1000000 \
--seedSearchStartLmax 5 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignSJoverhangMin 18 \
--alignSJDBoverhangMin 18 \
--chimSegmentMin 18 \
--chimJunctionOverhangMin 18 \
--outSJfilterOverhangMin 18 18 18 18 \
--alignTranscriptsPerReadNmax 50000 \

Featurecount

featureCounts -p -T 48 -t exon -g gene_id -a /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf -s 1 -o fc_CON1_C9_updated_index_and_star.txt /home/vvenugo9/ondemand/data/sys/myjobs/projects/default/11/Aligned.out.bam

Also the QC looks fine, but there's lot of warning in STAR while generating genome indices. Is that something I should worry about?

WARNING: while processing sjdbGTFfile=/scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf: chromosome 'HG1287_PATCH' not found in Genome fasta files for line:
HG1287_PATCH    rRNA    exon    144126020   144126129   .   +   .   gene_id "ENSG00000252830"; transcript_id "ENST00000517021"; exon_number "1"; gene_name "RNA5SP59"; gene_source "ensembl"; gene_biotype "rRNA"; transcript_name "RNA5SP59-202"; transcript_source "ensembl"; exon_id "ENSE00002089298";`

Featurecount STAR RNAseq • 1.1k views

ADD COMMENT • link 3.4 years ago by vinishavvenugopal ▴ 30

1

Entering edit mode

As an aside, FeatureCounts is quite obsolete for RNA-seq, as it throws out multimappers. You'd be better off using a newer method that accounts for multimapping reads more intelligently, e.g. salmon, RSEM, etc.

ADD REPLY • link 3.4 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Aren't those really old genome versions?

ADD REPLY • link 3.4 years ago by swbarnes2 14k

0

Entering edit mode

I'm trying to replicate an analysis for publishing purposes

ADD REPLY • link 3.4 years ago by vinishavvenugopal ▴ 30