Hi, I am new to aligning paired-end data using STAR. I used cutadapt to trim the adapters in my raw files and then used the output trimmed files to run with STAR. When I try to run the code for paired-end as follows:
STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd
I get the following error:
Jan 06 02:50:22 ..... started STAR run Jan 06 02:50:22 ..... loading
genome Jan 06 02:52:37 ..... started mapping
EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650
Jan 06 02:53:28 ...... FATAL ERROR, exiting
However, if I run the same code by removing the second read of the pair (so I only map the first single-end read) like this:
STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd
then it surprisingly works, as I get this msg:
Jan 06 02:09:45 ..... started STAR run Jan 06 02:09:45 ..... loading
genome Jan 06 02:13:25 ..... started mapping Jan 06 02:17:21 .....
finished successfully
Next, when I try and run the same code for single-end mapping, but using the Reverse read this time (read2) with the following code:
STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd
I get the following error again:
Jan 06 02:45:22 ..... started STAR run Jan 06 02:45:22 ..... loading
genome Jan 06 02:47:59 ..... started mapping
EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650
Jan 06 02:48:17 ...... FATAL ERROR, exiting
So somehow, my code is working only on the Forward read for my paired-end trimmed reads.
If it helps, here is the code I used for my cutadapt step:
cutadapt --cores=14 -q 10,10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o EBN1_L1_1_trimmed.fq.gz -p EBN1_L1_2_trimmed.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_1.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_2.fq.gz
Any suggestions about what I can change would be really helpful, thanks in advance!
EDIT/UPDATE: I just tried running it with a second set of trimmed samples and it seemed to have worked: Here is the code I wrote:
STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN2_L2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd
Here is what it printed on the screen after running:
Jan 06 02:58:20 ..... started STAR run Jan 06 02:58:20 ..... loading genome Jan 06 03:00:42 ..... started mapping Jan 06 03:13:40 ..... finished successfully
However, how can I use these output files to better understand whether my code did indeed run successfully or not? I know these log files have a lot of useful information but being new to this, could you kindly point me to some resources other than the STAR manual which I can use to understand why the code worked for one set of files and not the other, when in fact I have trimmed both of them with the exact same CUTADAPT code.
Thank you once again and I apologise if any of these questions are too basic, I'm still starting out!
Hi thanks for suggesting that. I'll give it a try. I am going to check my trimmed files with FastQC now to see what sequences are remaining. But other than that, do you think it could be a formatting error? How can I best view my trimmed fastq.gz files in the terminal if I just want to see how they are looking after the trimming process? Thanks!
Good idea.