Running STAR with --twopassMode Basic
1
0
Entering edit mode
5.2 years ago

Hi everybody, I am running STAR without and with the command --twopassMode Basic. In the first option I could run with no problems, but then and I tried to use the --twopassMode and I got a fatal error as below:

EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc
Possible cause 1: not enough RAM. Check if you have enough RAM 6907433764 bytes
Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 6907433764

But the funny thing is that I am running the tool in a cluster with 250GB RAM and 20 threads. That fatal error does not make any sense for me. I also tried to increase the ulimit -v and ulimit -n, but with no success. Has someone any idea what is happening?

EDIT: For someone reason when I remove the commands --limitOutSJcollapsed --imitSjdbInsertNsj --limitOutSJoneRead from command line, just keeping the default values for those ones, the --twopassMode ran perfectly. And thank you guys h.mon, and genomax, it seems that I need to ran with fewer threads and make sure that i am the only one in the node whit --exclusive[=user]

RNA-Seq STAR ulimit • 5.6k views
ADD COMMENT
0
Entering edit mode

But the funny thing is that I am running the tool in a cluster with 250GB RAM and 20 threads

I assume you are referring to a physical node on the cluster that has those specs (hopefully you have many more like this node). It does not automatically mean that all those resources are available for your job, especially if you are using a job scheduler. Are you using one? If so which one and what parameters are you using for that job submission?

ADD REPLY
0
Entering edit mode

Hi, i am using the interactive mode into the cluster and not a bash script (I do not know if that is the answer to your question). For initiate the interactive mode I request the resources as this way: srun --pty -p interactive --mem 250G -c 20 -t 0-12:00:00 /bin/bash

ADD REPLY
1
Entering edit mode

Let me ask you this. Do you see any other user processes running on that node (if you run top or htop)? Since you are not using --exclusive[=user] option my assumption is that there may be some other things running on that node.

BTW; that srun seems to indicate that your cluster uses SLURM job scheduler but you are not submitting a non-interactive job.

ADD REPLY
0
Entering edit mode

Exactly, my cluster uses SLURM, when I ran top and htop seem that I am the only one using that, i gonna try to open a new session with --exclusive and see what I get

ADD REPLY
1
Entering edit mode

Try to lower the number of threads to 16, or even lower. STAR creates a lot of temporary files for each thread, and you may be hitting the limit of ulimit -n.

I also tried to increase the ulimit -v and ulimit -n, but with no success.

"no success" means trying to increase the limits resulted in a error message? Or no error message, but limits were left unchanged? Or limits were changed, but STAR crashed anyway? You can only increase the limits to what is defined in /etc/security/limits.conf, so maybe you can't increase the limits anyway - you may need to contact the system administrators.

ADD REPLY
0
Entering edit mode

Hi!Thanks for your reply. By "no successs" I meant that the limits were changed and STAR crashed anyway. I will try to lower the number of threads

ADD REPLY
0
Entering edit mode

Hi guys, Just a update: I tried both suggestions: I try to run the SLURM with the --exclusive tag, and I also try to run with a reduced number of cores. I still have the same Log.out error, and then and I tried to use ulimit -v and ulimit -n the STAR crash yet. If someone has any insight about this trouble I will be happy to read that. Anyway, thank you guys for helping me.

ADD REPLY
2
Entering edit mode

Can you post the actual STAR command you are using? What genome are you trying to align against (size)?

ADD REPLY
9
Entering edit mode
5.1 years ago
caggtaagtat ★ 1.9k

Hi there,

you can do two-pass-mapping manually by splitting up the process. I read about it in some google group with developer Alex: So this is an example for single end data with an average length of 150nt.

First you do the first STAR alignment:

#Enter the right directory
cd /path_to_project/gap_table/

#For every Sample file
while read SAMPLE; do

#get sole file name
FILEBASE=$(basename "${SAMPLE%.fastq}")

#make new directory for the sample
mkdir /path_to_project/gap_table/$FILEBASE.STAR

#Enter the new directory
cd /path_to_project/gap_table/$FILEBASE.STAR

#align with STAR aligner
/path_to/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --genomeDir /path_to_STAR_index_directory/star_index_hg38_r149/ --sjdbOverhang 149 --sjdbGTFfile /path_to_genome_GTF_file/Homo_sapiens.GRCh38.91.gtf --outFileNamePrefix /path_to_project/gap_table/$FILEBASE.STAR/ --readFilesIn $SAMPLE 

done < Textfile_with_sample_names.txt

After a STAR alignment, there is a file generated called SJ.out.tab. This file hold information about every gap which passed the criteria in the first STAR alignment to be counted as such, with at least one read overlapping it. During 2-pass mapping, you want to mapp the reads of all samples against the all gap which were found in the first alignment of all samples together. Therefore, in the next step, you gather all SJ.out.tab files and fuse them to an unique one. This could be done for example like this:

Generating fused unique SJ.out.tab

#Collect all SJ_out.tab files in new directory SJ_tabs
cd /path_to_project/gap_table/

mkdir SJ_tabs

#For every fastq file in your directory
for i in /path_to_project/gap_table/*.STAR

#Do the following
do

#Enter the new directory
cd $i

#Get the base name of the current sample
b=$(basename "${i%.STAR}")

#Access files and copy them to new directory with a new sample specific name
cp SJ.out.tab /path_to_project/gap_table/SJ_tabs/$b.sj_out1.tab

done

#Getting only unique entries
cd /path_to_project/gap_table/SJ_tabs/

cat *.tab > all_SJ_tabs.tab

awk '!x[$0]++' all_SJ_tabs.tab > Unique_SJ_all.tab

#Transforming SJ.tab to gtf file with all found junctions
cd /path_to_project/gap_table/SJ_tabs/
awk 'BEGIN {OFS="\t"; strChar[0]="."; strChar[1]="+"; strChar[2]="-";} {if($5>0){print $1,$2,$3,strChar[$4]}}' /path_to_project/gap_table/SJ_tabs/Unique_SJ_all.tab > SJin.tab

awk '!x[$0]++' SJin.tab > Unique_SJin.tabs

#Clean up unwanted files
cd /path_to_project/gap_table/SJ_tabs/
rm *.tab

cp Unique_SJin.tabs Unique_SJin.tab

rm Unique_SJin.tabs

We now generate a new STAR index for the second STAR run, which holds all found gaps from the previous STAR run:

Generate new STAR index

#Go to rigth directory
cd /path_to_project/gap_table/

#make new directory with new genome for second STAR pass
mkdir Genome2

#Create Genome2
/path_to/STAR --runMode genomeGenerate --genomeDir Genome2 --genomeFastaFiles /path_to_genome_fasta_file/GRCh38_r91.fa --runThreadN 8 --sjdbGTFfile /path_to_genome_GTF_file/Homo_sapiens.GRCh38.91.gtf --sjdbOverhang 149 --sjdbFileChrStartEnd /path_to_project/gap_table/SJ_tabs/Unique_SJin.tab --limitSjdbInsertNsj 2000000

Using the newly generated STAR index, we no redo STAR alignment a second time:

Second STAR aligment:

#Go to rigth directory
cd /path_to_project/gap_table/

while read SAMPLE; do

#Get base file name
FILEBASE=$(basename "${SAMPLE%.trim.fq}")

#Make new directory for the sample
mkdir /path_to_project/gap_table/$FILEBASE.STAR_second_pass

#Enter the new directory
cd /path_to_project/gap_table/$FILEBASE.STAR_second_pass


#align with STAR aligner
/path_to_STAR/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --genomeDir /path_to_project/gap_table/Genome2/ --sjdbOverhang 149 --quantMode GeneCounts --sjdbGTFfile /path_to_genome_GTF_file/Homo_sapiens.GRCh38.91.gtf --limitSjdbInsertNsj 2000000 --readFilesIn $SAMPLE

done  < Text_file_with_sample_names.txt

And that's it :)

ADD COMMENT
0
Entering edit mode

Hi caggtaagtat, thank you so much about your very elaborated reply! It seems that this really help with this error!

ADD REPLY
1
Entering edit mode

Hi lucas.caue.jacintho, I'm glad it helped :)

ADD REPLY
0
Entering edit mode

In the STAR help menu, default values are

-alignSJDBoverhangMin 3 --alignSJoverhangMin 5

Do you recommend non-default values, as indicated in your syntax, for specific reason(s) ?

-alignSJDBoverhangMin 4 --alignSJoverhangMin 8

I tried both and in the final.log files - it made some, but not much of a difference.

So I'm curious for your reason(s) behind choosing non-default values. Empirical, theoretical, or something else?

Thank you!

ADD REPLY
1
Entering edit mode

Just to be clear, the code above was only to show, how I do two-pass mapping. I don't recommend the stated parameters as "better" default values. This is just personel preference, which is not emperical tested and soley based on theoretical thinking.

I study splicing and alternative splicing, so I wanted to reduce false positive aligned reads. Thats why I slightly increased these two parameters, since a minimal overlap of 3 nucleotides can happen with a chance of 1.6% but a 4 nt overlap with a chance of 0.4%, if I'm not mistaken.

Like you said, there is not much of a difference, and just a slightly more conservative way reads a counted.

ADD REPLY
0
Entering edit mode

Thanks for clarifying, and it makes total sense. Cheers!

ADD REPLY

Login before adding your answer.

Traffic: 2106 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6