Question

STAR mapping taking way too long (18hrs to complete ~60M PE reads)

0

Entering edit mode

3.9 years ago

Jen ▴ 80

I have a lot of experience running STAR and it usually runs pretty quick on my machine with 65G RAM, but I've made some changes on how I am creating the index file by including --sjdbOverhang 99 --genomeChrBinNbits 11 and --runThreadN 1. I did this because there was an issue with there not being enough RAM to build the genome index, which is weird, but I got that to work. I also used a different annotation file this time (gencode.vM25.annotation.gtf). I'm using STAR version=STAR_2.5.4b\

STAR --runThreadN 1 --runMode genomeGenerate --genomeDir star --genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa --sjdbOverhang 99 --genomeChrBinNbits 11 --limitGenomeGenerateRAM 16000000000 --sjdbGTFfile genes/gencode.vM25.annotation.gtf

Then when running the mapping step, it takes a REALLY long time (~7.5M reads/h). One file is >70M PE I've also made some changes as to how I run the mapping step. I've included --outSAMstrandField intronMotif so I can run StringTie afterwards.

STAR --genomeDir star --readFilesCommand zcat --readFilesIn samples/2_Forward.fq.gz samples/2_Reverse.fq.gz --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 16000000000 --outSAMunmapped Within --twopassMode Basic --outFilterMultimapNmax 1 --quantMode TranscriptomeSAM --outSAMstrandField intronMotif --runThreadN 16 --outFileNamePrefix "2_star/"

Here is the Log.Final.Out Started job on | Dec 24 14:55:02 Started mapping on | Dec 24 23:56:13 Finished on | Dec 25 09:08:01 Mapping speed, Million of reads per hour | 7.46

                      Number of input reads |   68563888
                  Average input read length |   200
                                UNIQUE READS:
               Uniquely mapped reads number |   62225717
                    Uniquely mapped reads % |   90.76%
                      Average mapped length |   199.37
                   Number of splices: Total |   40579909
        Number of splices: Annotated (sjdb) |   40566838
                   Number of splices: GT/AG |   40149129
                   Number of splices: GC/AG |   380060
                   Number of splices: AT/AC |   35737
           Number of splices: Non-canonical |   14983
                  Mismatch rate per base, % |   0.18%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.82
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.21
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   0
         % of reads mapped to multiple loci |   0.00%
    Number of reads mapped to too many loci |   4956534
         % of reads mapped to too many loci |   7.23%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   1.91%
                 % of reads unmapped: other |   0.11%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

I'm guessing it's something to do with available RAM, but I can't for the life of me figure it out.

cat /proc/meminfo

MemTotal:       65978504 kB
MemFree:        23424400 kB
MemAvailable:   63803704 kB
Buffers:        27117588 kB
Cached:         12989668 kB
SwapCached:            0 kB
Active:          5692484 kB
Inactive:       35445240 kB
Active(anon):     469060 kB
Inactive(anon):   630856 kB
Active(file):    5223424 kB
Inactive(file): 34814384 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:       1030652 kB
Mapped:           385564 kB
Shmem:             69424 kB
Slab:            1138764 kB
SReclaimable:    1067000 kB
SUnreclaim:        71764 kB
KernelStack:       10864 kB
PageTables:        32612 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    35086400 kB
Committed_AS:    3910452 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     3466428 kB
DirectMap2M:    59392000 kB
DirectMap1G:     5242880 kB

Does anyone have an idea as to what it making STAR go so slow? I SHOULD have plenty of RAM to do the job... Am I crazy??

STAR • 1.9k views

ADD COMMENT • link 3.9 years ago by Jen ▴ 80

0

Entering edit mode

You are probably trying to use too many threads with 64G of RAM. Can you try a lower number of threads (say 8) and see if that helps.

ADD REPLY • link 3.9 years ago by GenoMax 147k

0

Entering edit mode

I would monitor the job using something like top to see whether it is running or just sleeping due to some I/O problems.

ADD REPLY • link 3.9 years ago by ATpoint 85k

0

Entering edit mode

It does ultimately do the job...so it is doing it. Just excruciatingly slow. It's weird becasue it works fine using

--genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa  and   --sjdbGTFfile 
genes/Mus_musculus.GRCm38.100.gtf but not with --genomeFastaFiles genome/GRCm38.primary_assembly.genome.fa and --sjdbGTFfile genes/gencode.vM25.annotation.gtf

Here is the top output-

top - 08:34:22 up 1 day, 23:57,  1 user,  load average: 8.64, 5.57, 2.50
Tasks: 331 total,   2 running, 219 sleeping,   2 stopped,   0 zombie
%Cpu(s): 47.1 us,  0.4 sy,  0.0 ni, 52.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
**KiB Mem : 65978504 total,   557332 free, 29231604 used**, 36189568 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 35895876 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

**10753 robert    20   0 27.939g 0.025t   8844 R 752.9 41.0  33:31.40 STAR**

1946 robert    20   0 4403620 253816  56416 S   5.9  0.4  53:57.72 cinnamon

11452 robert    20   0   44208   4100   3376 R   5.9  0.0   0:00.01 top

1 root      20   0  225612   6984   4360 S   0.0  0.0   0:05.68 systemd

2 root      20   0       0      0      0 S   0.0  0.0   0:00.06 kthreadd

4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:+

6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_+

ADD REPLY • link updated 3.9 years ago by GenoMax 147k • written 3.9 years ago by Jen ▴ 80