Hi all, while looking at my log.final.out
file, it dawned on me that my uniquely mapped reads is really low.
See below:
Mapping speed, Million of reads per hour | 113.33
Number of input reads | 11050047
Average input read length | 300
UNIQUE READS:
Uniquely mapped reads number | 24100
Uniquely mapped reads % | 0.22%
Average mapped length | 279.06
Number of splices: Total | 31927
Number of splices: Annotated (sjdb) | 6179
Number of splices: GT/AG | 11720
Number of splices: GC/AG | 315
Number of splices: AT/AC | 119
Number of splices: Non-canonical | 19773
Mismatch rate per base, % | 1.65%
Deletion rate per base | 0.02%
Deletion average length | 3.24
Insertion rate per base | 0.02%
Insertion average length | 2.50
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 16802
% of reads mapped to multiple loci | 0.15%
Number of reads mapped to too many loci | 630
% of reads mapped to too many loci | 0.01%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 99.61%
% of reads unmapped: other | 0.02%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
I used STAR to make an index and subsequently aligned it. My command for STAR is as follow
STAR --genomeDir /home/user/scratch60/hg38_index \
--runThreadN 6 \
--readFilesIn /home/user/scratch60/SRR7059136.fastq \
--outFileNamePrefix /home/user/scratch60/STARresults/SRR7059136 \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outSAMattributes Standard
If some one could shed some light onto this, that would be much appreciated. Thanks!
have you done QC of fastq files? And are you sure that you are mapping to a correct reference genome?
"Reads too short" doesn't really mean that, it just means "didn't map".
As grant says, the first things to check are: is the fastq okay, or is it low quality garbage?
Second, are you sure this is the right reference?
Thanks guys, these fastq files are from published GEO SRA https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3109339 and I obtained the Fastq via fastq-dump.
They used GENCODE GRCh38 v26 gene annotation, while I used these to make the index,
but shouldn't make a difference?
Could you naively try to blast a few unmapped reads?
yeah, they do belong to the right GRCh38 assembly... i'm puzzled!
You lose a lot of reads here...