Question

Unable to map Reverse read using STAR aligner

0

Entering edit mode

5.3 years ago

bnayer26 • 0

Hi, I am new to aligning paired-end data using STAR. I used cutadapt to trim the adapters in my raw files and then used the output trimmed files to run with STAR. When I try to run the code for paired-end as follows:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

I get the following error:

Jan 06 02:50:22 ..... started STAR run Jan 06 02:50:22 ..... loading
genome Jan 06 02:52:37 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Jan 06 02:53:28 ...... FATAL ERROR, exiting

However, if I run the same code by removing the second read of the pair (so I only map the first single-end read) like this:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

then it surprisingly works, as I get this msg:

Jan 06 02:09:45 ..... started STAR run Jan 06 02:09:45 ..... loading
genome Jan 06 02:13:25 ..... started mapping Jan 06 02:17:21 .....
finished successfully

Next, when I try and run the same code for single-end mapping, but using the Reverse read this time (read2) with the following code:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

I get the following error again:

Jan 06 02:45:22 ..... started STAR run Jan 06 02:45:22 ..... loading
genome Jan 06 02:47:59 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Jan 06 02:48:17 ...... FATAL ERROR, exiting

So somehow, my code is working only on the Forward read for my paired-end trimmed reads.

If it helps, here is the code I used for my cutadapt step:

cutadapt --cores=14 -q 10,10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o EBN1_L1_1_trimmed.fq.gz -p EBN1_L1_2_trimmed.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_1.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_2.fq.gz

Any suggestions about what I can change would be really helpful, thanks in advance!

EDIT/UPDATE: I just tried running it with a second set of trimmed samples and it seemed to have worked: Here is the code I wrote:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN2_L2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

Here is what it printed on the screen after running:

Jan 06 02:58:20 ..... started STAR run Jan 06 02:58:20 ..... loading genome Jan 06 03:00:42 ..... started mapping Jan 06 03:13:40 ..... finished successfully

However, how can I use these output files to better understand whether my code did indeed run successfully or not? I know these log files have a lot of useful information but being new to this, could you kindly point me to some resources other than the STAR manual which I can use to understand why the code worked for one set of files and not the other, when in fact I have trimmed both of them with the exact same CUTADAPT code.

Thank you once again and I apologise if any of these questions are too basic, I'm still starting out!

STAR Aligner RNA-Seq Mapping • 1.7k views

ADD COMMENT • link updated 5.3 years ago by colindaven 7.4k • written 5.3 years ago by bnayer26 • 0

score 0 · Answer 1 · 2020-01-05

0

Entering edit mode

5.3 years ago

colindaven 7.4k

The error is likely due to the read2 being trimmed down to a very short length. EXITING because of FATAL ERROR in reads input: short read sequence

Try

don't use trimming at all. It should run through (there was a biorxiv paper in late 2019 https://www.biorxiv.org/content/10.1101/833962v1 describing why trimming is overrated and even unnecessary for modern RNA-seq aligners like STAR)
set the minimum length parameter in your trimmer to something higher.

ADD COMMENT • link 5.3 years ago by colindaven 7.4k

0

Entering edit mode

Hi thanks for suggesting that. I'll give it a try. I am going to check my trimmed files with FastQC now to see what sequences are remaining. But other than that, do you think it could be a formatting error? How can I best view my trimmed fastq.gz files in the terminal if I just want to see how they are looking after the trimming process? Thanks!

ADD REPLY • link 5.3 years ago by bnayer26 • 0

0

Entering edit mode

Good idea.

less x.fastq
less x.fastq.gz

ADD REPLY • link 5.3 years ago by colindaven 7.4k