Entering edit mode
3.4 years ago
vinishavvenugopal
▴
30
Hi,
I used STAR for alignment and featurecounts for quantification. The featurecount successfully assigned fragment
seems to be less (around 25%). It will be really helpful if I find the reason why the matching percentage is low.
It's a paired end sequencing - 83 x 83bp reads
STAR - generating indices
STAR --runThreadN 48 \
--genomeDir /scratch/vvenugo9/First_trial/indices \
--runMode genomeGenerate \
--genomeFastaFiles /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa \
--sjdbGTFfile /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf \
--sjdbOverhang 82
STAR alignment
STAR --readFilesCommand zcat \
--genomeDir /scratch/vvenugo9/First_trial/indices \
--readFilesIn /scratch/vvenugo9/First_trial/Choroid_C9_ALS1_TAGCTT_R1_001_out.fastq.gz /scratch/vvenugo9/First_trial/Choroid_C9_ALS1_TAGCTT_R2_001_out.fastq.gz \
--runThreadN 48 \
--sjdbGTFfile /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf \
--runMode alignReads \
--outSAMtype BAM Unsorted \
--outSAMmode Full \
--outSAMstrandField intronMotif \
--outFilterType BySJout \
--outSAMunmapped Within \
--outSAMmapqUnique 255 \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.1 \
--alignMatesGapMax 1000000 \
--seedSearchStartLmax 5 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignSJoverhangMin 18 \
--alignSJDBoverhangMin 18 \
--chimSegmentMin 18 \
--chimJunctionOverhangMin 18 \
--outSJfilterOverhangMin 18 18 18 18 \
--alignTranscriptsPerReadNmax 50000 \
Featurecount
featureCounts -p -T 48 -t exon -g gene_id -a /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf -s 1 -o fc_CON1_C9_updated_index_and_star.txt /home/vvenugo9/ondemand/data/sys/myjobs/projects/default/11/Aligned.out.bam
Also the QC looks fine, but there's lot of warning in STAR while generating genome indices. Is that something I should worry about?
WARNING: while processing sjdbGTFfile=/scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf: chromosome 'HG1287_PATCH' not found in Genome fasta files for line:
HG1287_PATCH rRNA exon 144126020 144126129 . + . gene_id "ENSG00000252830"; transcript_id "ENST00000517021"; exon_number "1"; gene_name "RNA5SP59"; gene_source "ensembl"; gene_biotype "rRNA"; transcript_name "RNA5SP59-202"; transcript_source "ensembl"; exon_id "ENSE00002089298";`
As an aside, FeatureCounts is quite obsolete for RNA-seq, as it throws out multimappers. You'd be better off using a newer method that accounts for multimapping reads more intelligently, e.g. salmon, RSEM, etc.
Aren't those really old genome versions?
I'm trying to replicate an analysis for publishing purposes