STAR aligner INPUT ERROR: could not open genomeFastaFile & could not open readFilesIn
1
0
Entering edit mode
3.8 years ago
slin023 • 0

Hi, I am trying to use STAR-2.7.7a to align library to my genome assembly, but I received these messages:

    Jan 28 23:33:02 ..... started STAR run
Jan 28 23:33:02 ... starting to generate Genome files

EXITING because of INPUT ERROR: could not open genomeFastaFile: /scratch/slin023/scratch/star/asm.contigs.filtered.fasta

Jan 28 23:33:02 ...... FATAL ERROR, exiting

EXITING because of fatal input ERROR: could not open readFilesIn=/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,

Jan 28 23:33:02 ...... FATAL ERROR, exiting

Here is my script:

#!/bin/bash
#SBATCH --qos pq_mdegenna
#SBATCH --account iacc_mdegenna
#SBATCH --partition IB_16C_96G
#SBATCH -n 16
#SBATCH -N 1
#SBATCH --output=log

export PATH=$PATH:/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/

##build Genome indices 

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /scratch/mdegenna/slin023/star/ --genomeFastaFiles /scratch/slin023/scratch/star/asm.contigs.filtered.fasta

##mapping to genome indices 

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /scratch/mdegenna/slin023/star/ --readFilesIn /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz, /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz, /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz, /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz

I checked the space, it seems no problem, I also checked whether the input read files exist:

    ls -l /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz
-rw-r--r-- 1 slin023 hpc_mdegenna 13251209254 Jan 20 17:55 /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz

any advice is welcomed, thank you for your time

assembly RNA-Seq • 4.9k views
ADD COMMENT
0
Entering edit mode
3.8 years ago
GenoMax 147k

Looks like you are submitting this job on a cluster. With the genome index generation, it is possible that directory path noted above was not be available on the actual worker node where the job ran. You are going to need to check on this with your local sys admins.

It also appears that you have spaces between file names after the commas in mapping step. Since the files are gzipped you will also need to add --readFilesCommand zcat to the mapping command.

ADD COMMENT
0
Entering edit mode

Hello, thank you for answering. I have tried it, but some input still can't be found:

    Jan 29 23:05:43 ..... started STAR run
Jan 29 23:05:43 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=638024656, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 13
Jan 29 23:05:56 ... starting to sort Suffix Array. This may take a long time...
Jan 29 23:06:01 ... sorting Suffix Array chunks and saving them to disk...
Jan 29 23:08:03 ... loading chunks from disk, packing SA...
Jan 29 23:08:20 ... finished generating suffix array
Jan 29 23:08:20 ... generating Suffix Array index
Jan 29 23:09:41 ... completed Suffix Array index
Jan 29 23:09:41 ... writing Genome to disk ...
Jan 29 23:09:42 ... writing Suffix Array to disk ...
Jan 29 23:09:47 ... writing SAindex to disk
Jan 29 23:09:49 ..... finished successfully
Jan 29 23:09:49 ..... started STAR run
Jan 29 23:09:50 ..... loading genome
gzip: /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz: No such file or directory
Jan 29 23:09:53 ..... started mapping
gzip: /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz: No such file or directory
gzip: /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz: No such file or directory
gzip: /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz: No such file or directory
Jan 29 23:42:48 ..... finished mapping
Jan 29 23:42:49 ..... finished successfully
log (END)

here is my script:

 #!/bin/bashlin023/star/asm.contigs.filtered.fasta
#SBATCH --qos pq_mdegenna
#SBATCH --account iacc_mdegenna
#SBATCH --partition IB_16C_96G
#SBATCH -n 16/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /scr#SBATCH -N 1a/slin023/star/ --readFilesCommand zcat /home/data/FLAG/Phormiatrans#SBATCH --output=logy-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/hexport PATH=$PATH:/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/Filtered_1P_second.star.sh
##build Genome indices 

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /scratch/mdegenna/slin023/star/ --genomeFastaFiles /scratch/mdegenna/slin023/star/asm.contigs.filtered.fasta

##mapping to Genome indices

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /scratch/mdegenna/slin023/star/ --readFilesCommand zcat /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz --readFilesIn /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz 
~
ADD REPLY
0
Entering edit mode

Not sure if we made progress but it looks like the program ran for 40 min or so. Did it produce any alignment files? You will need to figure out why gzip can't find those sequence files. Are those actually gzipped?

What do you get if you do file /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz?

Also take this warning into consideration, if you have not already done so.

!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=638024656, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 13

ADD REPLY
0
Entering edit mode

When I typed what you suggested, I got this :

    file /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz
/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)

And I did get output alignment file, but it's empty :

empty file

so does that mean I have to add --genomeSAindexNbases 13 to scale it down?

ADD REPLY
0
Entering edit mode

Are you working on windows? The file seems to be present in that location which is good. Can you show us output of zcat /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz | head -4. Just want to make sure the file is compressed using gzip.

And you are correct in that the alignment file is empty. Can you check Log* files to see if there is any diagnostic information in them?

ADD REPLY
0
Entering edit mode

the zcat command shows:

    @GWNJ-0901:693:GW2009043357th:7:1101:2372:1309 1:N:0:NTCCTCCT+NCAAGCAA
TTCAGAGACAACAGAGGGAGTAATCAACTTGTACTGAGGAACTTCTTTGTACAACTTGTCGTAGGTGGCTGTGTCGAAAAGCACTTGATGGTTAAGCT
+
AAAF7-AFFJF<AJFFFFFF7AFJ-J<F--<FFFFJJJJJJAJF7<7AJFJFFFA7JA--A-<FA-<F-7-7A-AF-FJFJ-AFF-<<A-FJ<-AF7A

I am not sure if I type it right

 [slin023@u01 star]$ Log*
-bash: Log.final.out: command not found

And no, I am working on Mac

ADD REPLY
0
Entering edit mode

The data file looks good and should be readable by STAR.

I was asking you to open those Log files (you can use textedit on Mac) and see if there are additional error messages in there that may help us.

ADD REPLY
0
Entering edit mode

this is "Log.final.out" file

                                     Started job on |   Jan 29 23:09:49
                             Started mapping on |   Jan 29 23:09:53
                                    Finished on |   Jan 29 23:42:49
       Mapping speed, Million of reads per hour |   684.01

                          Number of input reads |   375447087
                      Average input read length |   145
                                    UNIQUE READS:
                   Uniquely mapped reads number |   229587550
                        Uniquely mapped reads % |   61.15%
                          Average mapped length |   141.84
                       Number of splices: Total |   50408522
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   49668929
                       Number of splices: GC/AG |   165934
                       Number of splices: AT/AC |   11426
               Number of splices: Non-canonical |   562233
                      Mismatch rate per base, % |   0.33%
                         Deletion rate per base |   0.03%
                        Deletion average length |   2.29
                        Insertion rate per base |   0.07%
                       Insertion average length |   2.28
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   100217403
             % of reads mapped to multiple loci |   26.69%
        Number of reads mapped to too many loci |   17089832
             % of reads mapped to too many loci |   4.55%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   22419586
                 % of reads unmapped: too short |   5.97%
                Number of reads unmapped: other |   6132716
                     % of reads unmapped: other |   1.63%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

is it worth to uncompressed all the files and run it?

ADD REPLY
0
Entering edit mode

this is "Log.out" file

    STAR version=2.7.7a
STAR compilation time,server,dir=Mon Dec 28 13:38:40 EST 2020 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
##### Command Line:
/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /scratch/mdegenna/slin023/star/ --readFilesCommand zcat /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz
##### Initial USER parameters from Command Line:
###### All USER parameters from Command Line:
runThreadN                    16     ~RE-DEFINED
genomeDir                     /scratch/mdegenna/slin023/star/     ~RE-DEFINED
readFilesCommand              zcat   /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz        ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
runThreadN                        16
genomeDir                         /scratch/mdegenna/slin023/star/
readFilesCommand                  zcat   /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz   

-------------------------------
##### Final effective command line:
/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR   --runThreadN 16   --genomeDir /scratch/mdegenna/slin023/star/   --readFilesCommand zcat   /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq.gz,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq.gz   
----------------------------------------

Number of fastq files for each mate = 1

   Input read files for mate 1 :
ls: cannot access Read1: No such file or directory

EXITING: because of fatal INPUT file error: could not open read file: Read1
SOLUTION: check that this file exists and has read permision.

Feb 03 09:52:23 ...... FATAL ERROR, exiting
ADD REPLY
0
Entering edit mode

is it worth to uncompressed all the files and run it?

No. Your alignment is working.

You have not provided a crucial option --readFilesIn right before where you provide those fastq file names. Can you add that? You also need to separate the list of R1 and R2 files by a space.

--readFilesIn Sample1_R1.fq,Sample2_R1.fq Sample1_R2.fq,Sample2_R2.fq
ADD REPLY
0
Entering edit mode

Hello, I actually took someone's advice and gunzip the files, I received these: recommended --genomeSAindexNbases 13

Feb 03 14:43:44 ... starting to sort Suffix Array. This may take a long time...
Feb 03 14:43:50 ... sorting Suffix Array chunks and saving them to disk...
Feb 03 14:45:59 ... loading chunks from disk, packing SA...
Feb 03 14:46:16 ... finished generating suffix array
Feb 03 14:46:16 ... generating Suffix Array index
Feb 03 14:48:00 ... completed Suffix Array index
Feb 03 14:48:00 ... writing Genome to disk ...
Feb 03 14:48:01 ... writing Suffix Array to disk ...
Feb 03 14:48:07 ... writing SAindex to disk
Feb 03 14:48:09 ..... finished successfully
Feb 03 14:48:09 ..... started STAR run
Feb 03 14:48:11 ..... loading genome
Feb 03 14:48:14 ..... started mapping

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@GWNJ-0901:693:GW2009043357th:7:1106:24728:6565
@GWNJ-0901:693:GW2009043357th:7:1101:23328:59059 1:N:0:TTCCTCCT+AACCTCTC
+
SOLUTION: fix your fastq file

Feb 03 14:49:39 ...... FATAL ERROR, exiting

how to fix the string length, and just in case, here is my script, which I separate the R1 & R2 files by space:

#!/bin/bash
#SBATCH --qos pq_mdegennaes
#SBATCH --account iacc_mdegenna
#SBATCH --partition IB_16C_96GLinux_x86_64/STAR --runThreadN 16 --genomeDir /scr#SBATCH -n 16/slin023/star/ --readFilesIn /home/data/FLAG/PhormiatranscriptsSep2#SBATCH -N 1female-Phormia_Filtered_1P_second.fastq,/home/data/FLAG/Phormiatrans#SBATCH --output=logy-male-Phormia_Filtered_1P_second.fastq /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq,/home/dataexport PATH=$PATH:/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/

##build Genome indices 

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /scratch/mdegenna/slin023/star/ --genomeFastaFiles /scratch/mdegenna/slin023/star/asm.contigs.filtered.fasta

##mapping to Genome indices

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /scratch/mdegenna/slin023/star/ --readFilesIn /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_1P_second.fastq,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_1P_second.fastq /home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-female-Phormia_Filtered_2P_second.fastq,/home/data/FLAG/PhormiatranscriptsSep2020/Blowfly-male-Phormia_Filtered_2P_second.fastq 
~
ADD REPLY
1
Entering edit mode

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length

Looks like your fastq files are messed up in some way. What have you done to them in terms of trimming etc? I suggest you go back to original and let STAR handle soft-clipping parts of reads that don't map.

ADD REPLY
0
Entering edit mode

Hello, just let you know that I figured it out. Trimmomatic messed up the files since Star will auto-trim, so I used the original library, created unsorted bam file and sorted the output with Samtools. Thank you for your help!

#!/bin/bash
#SBATCH --qos pq_mdegenna
#SBATCH --account iacc_mdegenna
#SBATCH --partition IB_16C_96G
#SBATCH -n 16
#SBATCH -N 1
#SBATCH --output=log

export PATH=$PATH:/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/

module load samtools-1.9-gcc-8.2.0-o53igvd  

##build Genome indices 

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /scratch/mdegenna/slin023/star/ --genomeFastaFiles /scratch/mdegenna/slin023/star/asm.contigs.filtered.fasta

##mapping to Genome indices

/home/slin023/STAR-2.7.7a/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /scratch/mdegenna/slin023/star/ --readFilesIn /scratch/mdegenna/slin023/star/Blowfly-male-PhormiaSecond_R1_001.fastq,/scratch/mdegenna/slin023/star/Blowfly-female-PhormiaSecond_R1_001.fastq /scratch/mdegenna/slin023/star/Blowfly-male-PhormiaSecond_R2_001.fastq,/scratch/mdegenna/slin023/star/Blowfly-female-PhormiaSecond_R2_001.fastq --outSAMtype BAM Unsorted 

samtools sort /scratch/mdegenna/slin023/star/Aligned.out.bam -o AlignedSorted.out.bam
ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6