I have a lot of experience running STAR and it usually runs pretty quick on my machine with 65G RAM, but I've made some changes on how I am creating the index file by including --sjdbOverhang 99 --genomeChrBinNbits 11 and --runThreadN 1. I did this because there was an issue with there not being enough RAM to build the genome index, which is weird, but I got that to work. I also used a different annotation file this time (gencode.vM25.annotation.gtf). I'm using STAR version=STAR_2.5.4b\
STAR --runThreadN 1 --runMode genomeGenerate --genomeDir star --genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa --sjdbOverhang 99 --genomeChrBinNbits 11 --limitGenomeGenerateRAM 16000000000 --sjdbGTFfile genes/gencode.vM25.annotation.gtf
Then when running the mapping step, it takes a REALLY long time (~7.5M reads/h). One file is >70M PE I've also made some changes as to how I run the mapping step. I've included --outSAMstrandField intronMotif so I can run StringTie afterwards.
STAR --genomeDir star --readFilesCommand zcat --readFilesIn samples/2_Forward.fq.gz samples/2_Reverse.fq.gz --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 16000000000 --outSAMunmapped Within --twopassMode Basic --outFilterMultimapNmax 1 --quantMode TranscriptomeSAM --outSAMstrandField intronMotif --runThreadN 16 --outFileNamePrefix "2_star/"
Here is the Log.Final.Out Started job on | Dec 24 14:55:02 Started mapping on | Dec 24 23:56:13 Finished on | Dec 25 09:08:01 Mapping speed, Million of reads per hour | 7.46
Number of input reads | 68563888
Average input read length | 200
UNIQUE READS:
Uniquely mapped reads number | 62225717
Uniquely mapped reads % | 90.76%
Average mapped length | 199.37
Number of splices: Total | 40579909
Number of splices: Annotated (sjdb) | 40566838
Number of splices: GT/AG | 40149129
Number of splices: GC/AG | 380060
Number of splices: AT/AC | 35737
Number of splices: Non-canonical | 14983
Mismatch rate per base, % | 0.18%
Deletion rate per base | 0.01%
Deletion average length | 1.82
Insertion rate per base | 0.01%
Insertion average length | 1.21
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 0
% of reads mapped to multiple loci | 0.00%
Number of reads mapped to too many loci | 4956534
% of reads mapped to too many loci | 7.23%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 1.91%
% of reads unmapped: other | 0.11%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
I'm guessing it's something to do with available RAM, but I can't for the life of me figure it out.
cat /proc/meminfo
MemTotal: 65978504 kB
MemFree: 23424400 kB
MemAvailable: 63803704 kB
Buffers: 27117588 kB
Cached: 12989668 kB
SwapCached: 0 kB
Active: 5692484 kB
Inactive: 35445240 kB
Active(anon): 469060 kB
Inactive(anon): 630856 kB
Active(file): 5223424 kB
Inactive(file): 34814384 kB
Unevictable: 32 kB
Mlocked: 32 kB
SwapTotal: 2097148 kB
SwapFree: 2097148 kB
Dirty: 72 kB
Writeback: 0 kB
AnonPages: 1030652 kB
Mapped: 385564 kB
Shmem: 69424 kB
Slab: 1138764 kB
SReclaimable: 1067000 kB
SUnreclaim: 71764 kB
KernelStack: 10864 kB
PageTables: 32612 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 35086400 kB
Committed_AS: 3910452 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 3466428 kB
DirectMap2M: 59392000 kB
DirectMap1G: 5242880 kB
Does anyone have an idea as to what it making STAR go so slow? I SHOULD have plenty of RAM to do the job... Am I crazy??
You are probably trying to use too many threads with 64G of RAM. Can you try a lower number of threads (say 8) and see if that helps.
I would monitor the job using something like top to see whether it is running or just sleeping due to some I/O problems.
It does ultimately do the job...so it is doing it. Just excruciatingly slow. It's weird becasue it works fine using
Here is the top output-