How to know the genome indices from STAR is correct
1
0
Entering edit mode
2.3 years ago
Chris ▴ 340

Hi Bioinformaticians,

I run STAR with this command:

STAR --runThreadN ${NSLOTS} \ --runMode genomeGenerate \ --genomeDir /home/doan/hg38/hg38_index_new \ --genomeFastaFiles /home/doan/hg38/Homo_sapiens.GRCh38.dna_sm.prima$ --sjdbGTFfile /home/doan/hg38/Homo_sapiens.GRCh38.107.gtf \ --sjdbOverhang 99

I got this output:

chrLength.txt geneInfo.tab sjdbInfo.txt
chrNameLength.txt Genome sjdbList.fromGTF.out.tab
chrName.txt genomeParameters.txt sjdbList.out.tab
chrStart.txt Log.out transcriptInfo.tab
exonGeTrInfo.tab SA
exonInfo.tab SAindex

The size of this genome indices is 28Gb. Then I run alignment but the size of all output files is 0. Would anyone please tell me what is wrong?

STAR • 3.7k views
ADD COMMENT
1
Entering edit mode
2.3 years ago
GenoMax 147k

Then I run alignment but the size of all output files is 0.

You are not running an alignment just yet. A STAR genomegenerate job can take an hour or two for human genome. If you are simply looking at the files right after starting the job then they will be zero bytes. If the job completed and you still have some files that are zero bytes then you should look at the log file. If you did not capture the log during run then you may need to do so again and capture the standard out/error streams to files.

If you had not generated the genome index then what were you trying to align against in your last thread?

ADD COMMENT
0
Entering edit mode

Creating genome indices took less than 1 hour with the output I listed above but the alignment is less than 1 minute so as you said there was something wrong here. A submitted job on the server, a job will disappear from status when it finishes.

ADD REPLY
1
Entering edit mode

Are you sure the genome generate job completed successfully i.e. there were no errors? Can you show a listing of the files above so we can see their sizes? e.g. ls -lh *?

ADD REPLY
0
Entering edit mode

I run STAR by submitting the script so maybe the error if exists, it won't show as run directly from the shell.

-rw-r-----. 1 doanc2 doanc2 1.2K Aug 4 14:54 chrLength.txt
-rw-r-----. 1 doanc2 doanc2 3.1K Aug 4 14:54 chrNameLength.txt
-rw-r-----. 1 doanc2 doanc2 1.9K Aug 4 14:54 chrName.txt
-rw-r-----. 1 doanc2 doanc2 2.1K Aug 4 14:54 chrStart.txt
-rw-r-----. 1 doanc2 doanc2 56M Aug 4 14:53 exonGeTrInfo.tab
-rw-r-----. 1 doanc2 doanc2 23M Aug 4 14:54 exonInfo.tab
-rw-r-----. 1 doanc2 doanc2 2.4M Aug 4 14:53 geneInfo.tab
-rw-r-----. 1 doanc2 doanc2 3.0G Aug 4 15:38 Genome
-rw-r-----. 1 doanc2 doanc2 844 Aug 4 15:38 genomeParameters.txt
-rw-r-----. 1 doanc2 doanc2 34K Aug 4 15:38 Log.out
-rw-r-----. 1 doanc2 doanc2 24G Aug 4 15:38 SA
-rw-r-----. 1 doanc2 doanc2 1.5G Aug 4 15:38 SAindex
-rw-r-----. 1 doanc2 doanc2 12M Aug 4 15:34 sjdbInfo.txt
-rw-r-----. 1 doanc2 doanc2 12M Aug 4 14:54 sjdbList.fromGTF.out.tab
-rw-r-----. 1 doanc2 doanc2 8.8M Aug 4 15:34 sjdbList.out.tab
-rw-r-----. 1 doanc2 doanc2 16M Aug 4 14:54 transcriptInfo.tab

I am not sure if this error is related or not but it is the content of a file name star.e101042.

/usr/global/sge/default/spool/fenn03/job_scripts/101042: line 62: let: TOTAL=1659388592 - : syntax error: operand expected (error token is "- ")

(standard_in) 2: syntax error

ADD REPLY
1
Entering edit mode

These files look to be about the right size when I compare them (for qualitative reason) so I am going to hazard a guess that the index should be good.

What do you see when you do tail -n 6 Log.out in directory above? It should show something like following if the index is complete.

Jan 29 14:41:26 ... writing SAindex to disk
Writing 8 bytes into .//SAindex ; empty space on disk = 1209157942247424 bytes ... done
Writing 120 bytes into .//SAindex ; empty space on disk = 1209157942247424 bytes ... done
Writing 1565873491 bytes into .//SAindex ; empty space on disk = 1209157942247424 bytes ... done
Jan 29 14:41:33 ..... finished successfully
DONE: Genome generation, EXITING 

It is possible that if you copy/pasted the genome generate command from like a PDF it is possible that hyphens were converted to "smart hyphens" (are you on macOS by chance?).

ADD REPLY
0
Entering edit mode

Yes, I am on macOS and I am surprised when I type period here, it was converted to a question mark.

tail -n 6 Log.out

Number of fastq files for each mate = 1

EXITING because of fatal input ERROR: could not open readFilesIn=Read1

ADD REPLY
1
Entering edit mode

I was asking you for the tail output of the Log.out file for index creation. Looks like you probably ran the alignment in the same directory so the output must have got overwritten with one for the alignment.

could not open readFilesIn=Read1

Looks like your input file is not in the same directory, or has the proper path or has the correct name. Which of the three is an issue?

ADD REPLY
0
Entering edit mode

Sorry for misunderstanding your request. Here is the output of Log.out file for index creation.

    tail -n 6 Log.out 

Aug 04 15:38:43 ... writing SAindex to disk
Writing 8 bytes into /home/doanc2/hg38/hg38_index//SAindex ; empty space on disk = 378587739848704 bytes ... done  
Writing 120 bytes into /home/doanc2/hg38/hg38_index//SAindex ; empty space on disk = 378587739848704 bytes ... done  
Writing 1565873491 bytes into /home/doanc2/hg38/hg38_index//SAindex ; empty space on disk = 378587739848704 bytes ... done  
Aug 04 15:38:46 ..... finished successfully  
DONE: Genome generation, EXITING
ADD REPLY
1
Entering edit mode

Looks like your index is OK. So the issue you are having with alignments should not be related to the index.

ADD REPLY
0
Entering edit mode

Thank you so much for your help! Would you have any recommendations for me to fix the alignment issue?

ADD REPLY
1
Entering edit mode

Is this error the same referred to in your prior thread?

While it is not recommended you could simply type the STAR command out on the login/head node prompt and see if job starts (running it interactively). Be ready to kill the job (ctrl + C) so it does not actually continue running. Once you know the command works (i.e. it does not generate any errors), you can simply copy/paste it in your job submission script. This will help you debug the issue with file paths etc.

ADD REPLY
0
Entering edit mode

You answered the title question. Yes, it is. Because the output files after alignment are 0 sizes so converting from a wrong sam file to a bam file doesn't make any sense. Do I need to create a new thread?

ADD REPLY
1
Entering edit mode

Did you try running the STAR command directly on the terminal prompt as I suggested above?

ADD REPLY
0
Entering edit mode

Aug 09 12:43:41 ..... started STAR run
Aug 09 12:43:41 ..... loading genome
Aug 09 12:45:05 ..... started mapping

ADD REPLY
1
Entering edit mode

Ok what ever this command line is it seems to be working. Kill this job and then copy this command into your job submission script.

ADD REPLY
0
Entering edit mode

GenoMax I open the file created from the submitted job and got this:

cat test.e101168

cat: /tmp/101168.1.all.q/machines: No such file or directory

EXITING because of fatal input ERROR: could not open readFilesIn=Read1

Aug 09 17:53:27 ...... FATAL ERROR, exiting /usr/global/sge/default/spool/fenn03/job_scripts/101168: line 76: --runMode: command not found /usr/global/sge/default/spool/fenn03/job_scripts/101168: line 77: --genomeDir: command not found /usr/global/sge/default/spool/fenn03/job_scripts/101168: line 78: --readFilesIn: command not found /usr/global/sge/default/spool/fenn03/job_scripts/101168: line 79: --outSAMtype: command not found

The error when run STAR:

cat star.e101169

line 69: let: TOTAL=1660095741 - : syntax error: operand expected (error token is "- ") (standard_in) 2: syntax error

ADD REPLY

Login before adding your answer.

Traffic: 1623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6