Hi,
I am using STAR to align my RNAseq datasets and I am having this error
ReadAlignChunk_processChunks.cpp:115:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >
This is my code
module load star/2.5.3a
STAR -- genomeDir mouse/star_genome_mm10 \
-- readFilesIn L001_R1_001.fastq.gz \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--twopassMode Basic \
--outFilterMultimapNmax 1
--quantMode TranscriptomeSAM \
--runThreadN 6 \
--outFileNamePrefix "STAR_output/Test/"
The Fastq files look like this
@NS500540:133:HNFTLBGX5:1:11101:11802:1042 1:N:0:ACTGAT
CTCCGNTTTATTTATTTGTTCTGCAAATTCGATGCGTCTACCTTCAAATAAAGCATTCATCTTTCTCTGTGACTCT
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Is there something wrong I am not able to figure out? I have checked the file for each line, they start with @
thank you
Are you certain about that? Did you do anything to this file that may have corrupted the format (e.g. improper trimming)?
You can try validateFiles utility from Jim Kent to see if your file checks out.
Thats looks like an useful utility.
When I downloaded it saves as a text file, which I cant run. Can you please let me know how to use it?
You need to add execute permission to it by doing
chmod a+x validateFiles
before you can run it.There shouldn't be a space in
-- readFilesIn
. Please select a title which describes your problem better than this.thank you. I will make that change. Also, I changed the title.
what are the outputs of
and
?
the output for
file L001_R1_001.fastq.gz
isL001_R1_001.fastq.gz: gzip compressed data, extra field
and the output for
gunzip -c L001_R1_001.fastq.gz | paste - - - - | cut -c 1 | uniq | sort | uniq
is@