bcl2fastq: Could not parse the CSV stream text
2
0
Entering edit mode
6.6 years ago

Also posted on bioinformatics stackexchange.

I am trying to run bcl2fastq to generate fastq files from the bcl ones that I got for 10X single cell experiment run. I am getting the following exception when I am trying to run the bcl2fastq:

https://ibb.co/dOuSxH

For that I am using the following bash script, generate_fastq.sh that I made myself:

 #!/bin/bash

 FLOWCELL_DIR="/scratch/nv4e/kipnis/180403_NB501830_0158_AHN3LLBGX5"
 OUTPUT_DIR="/scratch/nv4e/kipnis/fastq"
 INTEROP_DIR="/scratch/nv4e/kipnis/180403_NB501830_0158_AHN3LLBGX5/InterOp"
 SAMPLE_SHEET_PATH="/scratch/nv4e/kipnis/sample_sheet.csv"

 bcl2fastq --use-bases-mask=Y26,I8,Y98 --create-fastq-for-index-reads --minimum-trimmed-read-length=8 --mask-short-adapter-reads=8 --ignore-missing-positions --ignore-missing-controls --ignore-missing-filter --ignore-missing-bcls -r 6 -w 6 -R ${FLOWCELL_DIR} --output-dir=${OUTPUT_DIR} --interop-dir=${INTEROP_DIR} --sample-sheet=${SAMPLE_SHEET_PATH}

So, apparently something is wrong with my sample sheet. I looked into RunInfo.xml and there I see 3 reads:

https://ibb.co/h2WqHH

I used the sample sheet generator: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/bcl2fastq-direct

and got the following file, sample_sheet.csv:

 [Header]
 EMFileVersion,4

 

[Reads]
 26
 8
 98

 

  [Data]
 Lane,Sample_ID,Sample_Name,index,Sample_Project
 1,SI-GA-B4_1,17R,ACTTCATA,Chromium_20180406
 1,SI-GA-B4_2,17R,GAGATGAC,Chromium_20180406
 1,SI-GA-B4_3,17R,TGCCGTGG,Chromium_20180406
 1,SI-GA-B4_4,17R,CTAGACCT,Chromium_20180406 
 1,SI-GA-B5_1,19RL,AATAATGG,Chromium_20180406
 1,SI-GA-B5_2,19RL,CCAGGGCA,Chromium_20180406
 1,SI-GA-B5_3,19RL,TGCCTCAT,Chromium_20180406
 1,SI-GA-B5_4,19RL,GTGTCATC,Chromium_20180406
 2,SI-GA-B4_1,17R,ACTTCATA,Chromium_20180406
 2,SI-GA-B4_2,17R,GAGATGAC,Chromium_20180406
 2,SI-GA-B4_3,17R,TGCCGTGG,Chromium_20180406
 2,SI-GA-B4_4,17R,CTAGACCT,Chromium_20180406
 2,SI-GA-B5_1,19RL,AATAATGG,Chromium_20180406
 2,SI-GA-B5_2,19RL,CCAGGGCA,Chromium_20180406
 2,SI-GA-B5_3,19RL,TGCCTCAT,Chromium_20180406
 2,SI-GA-B5_4,19RL,GTGTCATC,Chromium_20180406
 3,SI-GA-B4_1,17R,ACTTCATA,Chromium_20180406
 3,SI-GA-B4_2,17R,GAGATGAC,Chromium_20180406
 3,SI-GA-B4_3,17R,TGCCGTGG,Chromium_20180406
 3,SI-GA-B4_4,17R,CTAGACCT,Chromium_20180406
 3,SI-GA-B5_1,19RL,AATAATGG,Chromium_20180406
 3,SI-GA-B5_2,19RL,CCAGGGCA,Chromium_20180406
 3,SI-GA-B5_3,19RL,TGCCTCAT,Chromium_20180406
 3,SI-GA-B5_4,19RL,GTGTCATC,Chromium_20180406
 4,SI-GA-B4_1,17R,ACTTCATA,Chromium_20180406
 4,SI-GA-B4_2,17R,GAGATGAC,Chromium_20180406
 4,SI-GA-B4_3,17R,TGCCGTGG,Chromium_20180406
 4,SI-GA-B4_4,17R,CTAGACCT,Chromium_20180406
 4,SI-GA-B5_1,19RL,AATAATGG,Chromium_20180406
 4,SI-GA-B5_2,19RL,CCAGGGCA,Chromium_20180406
 4,SI-GA-B5_3,19RL,TGCCTCAT,Chromium_20180406 
 5,SI-GA-B5_4,19RL,GTGTCATC,Chromium_20180406

What is wrong with my .csv? What am I doing wrong?

sequencing fastq bcl • 7.0k views
ADD COMMENT
1
Entering edit mode
6.6 years ago
GenoMax 147k

Use the cellranger mkfastq method shown in my previous post to demultiplex the data: C: scRNA-seq data processing from 10X device

cellranger mkfastq --id=my_id \
                     --run=/path/to/illumina_data_folder \
                     --csv=samplesheet.csv

This samplesheet is not exactly in the format that bcl2fastq uses but will work with cellranger.

ADD COMMENT
0
Entering edit mode

cellranger generated the same error: Could not parse the CSV stream text:

Here is more detailed error from _stderr generated file: https://ibb.co/bxiJ4x

ADD REPLY
0
Entering edit mode

Looks like it is the carriage return/line feed difference. You can use the dos2unix file.csv to convert CRLF to LF. If dos2unix is not on your system then you would know what to do.

ADD REPLY
2
Entering edit mode

I just got the same error with bcl2fastq on my own project...someone thought it was clever to spell 'naive' with a diaeresis. Since the visible characters look okay in what you posted, it must be a white space character, as Genomax suggested.

ADD REPLY
0
Entering edit mode

There is no dos2unix installed and I tried to use tr -d '\r' < input > output and perl -pi -e 's/\r\n/\n/g' input from the following thread:

https://unix.stackexchange.com/questions/277217/how-to-install-dos2unix-on-linux-without-root-access

But the error stays the same.

ADD REPLY
0
Entering edit mode

Why does your error show reads settings as following (your screen cap included below)?

26
98
98

std err

That should be

26
98

correct?

ADD REPLY
0
Entering edit mode

No, it should be

26
8  
98

I have 3 reads. I do not understand why _stderr file is showing that because I am feeding the correct file in it. That seems very weird for me.

ADD REPLY
0
Entering edit mode

Why are you modifying the output from the official sample sheet generator?

It should be:

[Reads]
26
98

You don't have 3 reads. You have 2 reads and an index read.

ADD REPLY
0
Entering edit mode

I changed it to two read, not working. Removing all the top thing until [Data] gives sample sheet formatting error. Everything looks good in sample sheet, so either I need to somehow find what is actually wrong in samplesheet, some kind of a parsing, verifying program that would tell which line is wrong or there is something else going wrong.

ADD REPLY
0
Entering edit mode

Are you able to run the test included in the software (look for the tinyBCL dataset)?

Let's make sure your installation works properly.

ADD REPLY
0
Entering edit mode

Sure, their sample test works perfect

ADD REPLY
0
Entering edit mode

Now I am close to getting stumped. So the problem is clearly your samplesheet file. Is there a SampleSheet.csv file in the raw data folder you have. Can you rename it something else to ensure that cellranger is reading the file you made using their tool?

Can you also verify what @swbarnes2 commented on: C: bcl2fastq: Could not parse the CSV stream text

You can also contact 10x tech support to see if they have a solution.

ADD REPLY
0
Entering edit mode

Oh my god, thank you so much! I cannot tell you how much time this saved me, I actually made this account just in case you see this at some point. This was a spreadsheet that never touched a windows machine save for a microsoft file sharing server which apparently was enough to corrupt it. All of the unix based software I used never saw an issue until bcl2fastq.

ADD REPLY
1
Entering edit mode
6.6 years ago

Those ^M characters at the end of the lines in your error output...those are whitespace characters. That's probably what's messing up the parser.

ADD COMMENT
0
Entering edit mode

He said he fixed that in one of the comments.

ADD REPLY

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6