Unable to map Reverse read using STAR aligner
1
0
Entering edit mode
4.9 years ago
bnayer26 • 0

Hi, I am new to aligning paired-end data using STAR. I used cutadapt to trim the adapters in my raw files and then used the output trimmed files to run with STAR. When I try to run the code for paired-end as follows:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

I get the following error:

Jan 06 02:50:22 ..... started STAR run Jan 06 02:50:22 ..... loading
genome Jan 06 02:52:37 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Jan 06 02:53:28 ...... FATAL ERROR, exiting

However, if I run the same code by removing the second read of the pair (so I only map the first single-end read) like this:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

then it surprisingly works, as I get this msg:

Jan 06 02:09:45 ..... started STAR run Jan 06 02:09:45 ..... loading
genome Jan 06 02:13:25 ..... started mapping Jan 06 02:17:21 .....
finished successfully

Next, when I try and run the same code for single-end mapping, but using the Reverse read this time (read2) with the following code:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

I get the following error again:

Jan 06 02:45:22 ..... started STAR run Jan 06 02:45:22 ..... loading
genome Jan 06 02:47:59 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Jan 06 02:48:17 ...... FATAL ERROR, exiting

So somehow, my code is working only on the Forward read for my paired-end trimmed reads.

If it helps, here is the code I used for my cutadapt step:

cutadapt --cores=14 -q 10,10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o EBN1_L1_1_trimmed.fq.gz -p EBN1_L1_2_trimmed.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_1.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_2.fq.gz

Any suggestions about what I can change would be really helpful, thanks in advance!

EDIT/UPDATE: I just tried running it with a second set of trimmed samples and it seemed to have worked: Here is the code I wrote:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN2_L2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

Here is what it printed on the screen after running:

Jan 06 02:58:20 ..... started STAR run Jan 06 02:58:20 ..... loading genome Jan 06 03:00:42 ..... started mapping Jan 06 03:13:40 ..... finished successfully

However, how can I use these output files to better understand whether my code did indeed run successfully or not? I know these log files have a lot of useful information but being new to this, could you kindly point me to some resources other than the STAR manual which I can use to understand why the code worked for one set of files and not the other, when in fact I have trimmed both of them with the exact same CUTADAPT code.

Thank you once again and I apologise if any of these questions are too basic, I'm still starting out!

STAR Aligner RNA-Seq Mapping • 1.5k views
ADD COMMENT
0
Entering edit mode
4.9 years ago

The error is likely due to the read2 being trimmed down to a very short length. EXITING because of FATAL ERROR in reads input: short read sequence

Try

  • don't use trimming at all. It should run through (there was a biorxiv paper in late 2019 https://www.biorxiv.org/content/10.1101/833962v1 describing why trimming is overrated and even unnecessary for modern RNA-seq aligners like STAR)
  • set the minimum length parameter in your trimmer to something higher.
ADD COMMENT
0
Entering edit mode

Hi thanks for suggesting that. I'll give it a try. I am going to check my trimmed files with FastQC now to see what sequences are remaining. But other than that, do you think it could be a formatting error? How can I best view my trimmed fastq.gz files in the terminal if I just want to see how they are looking after the trimming process? Thanks!

ADD REPLY
0
Entering edit mode

Good idea.

less x.fastq
less x.fastq.gz
ADD REPLY

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6