BCL files conversion to FASTQ without SampleSheet.csv
1
0
Entering edit mode
3.2 years ago

Dear community,

I have got NGS data which is basically the BaseCalls folder with .bcl files. I want to know how to successfully convert .bcl files to .fastq format. So far, I have been using the bcl2fastq program, however, I have no SampleSheet.csv file. I have generated my own file like this because the program crashes otherwise.

My own generated SampleSheet.csv file

Is it a good way to do it? Am I missing something? Unfortunately, I have not been working with .btl data so far. Also, I have not done sequencing myself.

By the way, now bcl2fastq doest not crash but spamming this message:

Running program

fastq demultiplexing bcl2fq-local bcl • 8.7k views
ADD COMMENT
1
Entering edit mode

Honestly, what kind of lazy sequencing department throws a bunch of bcls at you and expects you to demultiplex them yourself?

ADD REPLY
0
Entering edit mode

I know, however, I have no choice, nobody knows about data analysis in my team. That is why I am asking on this website. :)

ADD REPLY
1
Entering edit mode

Ask 10xGenomics. Very few people here have experience trying to runbcl2fastq without access to the whole folder. 10xGenomics should know exactly what is triggering that message.

ADD REPLY
2
Entering edit mode
3.2 years ago
GenoMax 147k

Do you have the full raw data folder available? In order to use bcl2fastq that is a requirement. Error message above is referring to a RunInfo.xml file that is missing. If you don't have the full data folder then you may need to use IlluminaBasecallsToFastq from Picard tools.

Illumina provides a program called "Illumina experiment manager" (Windows only) that will help you create SampleSheet files in correct format.

ADD COMMENT
0
Entering edit mode

Yes, I have the full raw data folder, unfortunately, without the SampleSheet.csv file. Now I entered the RunInfo.xml file, thanks for spotting the error. :)

Running program

ADD REPLY
1
Entering edit mode

You can easily make a SampleSheet file up. I will post an example in a bit if you have not already managed to find one.

Add records one per line as comma separated values and then save the file as SampleSheet.csv.

[Header]                                    
IEMFileVersion  4                               
Investigator Name                               
Experiment Name                                 
Date    7/8/2021                                
Workflow    GenerateFASTQ                               
Application FASTQ Only                              
Assay                                   
Description                         
Chemistry                                   

[Reads]                                 
150 < - `change these values to what you have`                          
150                                 

[Data]                                  
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description

Only the sample ID, sample Name and Index/Index2 columns are critical. Rest of columns can be blank.

ADD REPLY
1
Entering edit mode

You don't need Sample Name and Sample ID. Sample_ID alone should be fine. However, I strongly recommend giving your samples better names than 1-20. 6 months down the road you will have no idea at all what is what.

ADD REPLY
0
Entering edit mode

Thank you for the suggestion. :) I hope I will make some progress.

ADD REPLY
0
Entering edit mode

I have done all of that, but somehow the program still responds "Sequencing not finished". I am going to install LINUX virtual box system and then try the following steps... Maybe the Ubuntu terminal does not have some needed libraries. However, your help is really appreciated. :)

ADD REPLY
1
Entering edit mode

program still responds "Sequencing not finished".

I think you may be missing some critical files from the folder. Do you see these files in folder?

RTAComplete.txt
RTARead1Complete.txt
RTARead2Complete.txt
RTARead3Complete.txt
RTARead4Complete.txt
SequencingComplete.txt
ADD REPLY
0
Entering edit mode

I am missing RTARead4Complete.txt and SequencingComplete.txt.

I can show you my raw data folder. By the way, the folder named structured and "SampleSheet.csv" file were created on my own.

1. The raw data folder Raw data folder

2. The config folder config file 3. Data/Intensities/BaseCalls folder contains two (lanes) folders: L001 and L002. Sequencing data in these folders look like this: Btl files

4. They also have sent me additional QIAGEN library prep .xml files (with multiple sheets) where I found indexes and names of samples, etc. Basically, the information that I need to create the SampleSheet.csv file.

QIAGEN library prep

5. SampleSheets.csv file created by me

samplesheet.csv

I hope it helps you to get an idea. I am lost, so thank you for your support.

ADD REPLY
0
Entering edit mode

If you actually have full data folder then you may have a 1D index run instead of 2D. Can you show us the lines that have <Reads> section in the RunInfo.xml file?

ADD REPLY
0
Entering edit mode

What is the difference between 1D and 2D runs? Is there any solution?

ss

ADD REPLY
1
Entering edit mode

You have a 151 bp single-end dual-index (2D, 8 bp each) run. That samplesheet should be ok.

This appears to be a run on NextSeq 500 or 550 since you have bgzip compressed bcl files. AFAIK bcl2fastq should work for this run. I have never seen that particular error in many years of using bcl2fastq.

At this point it may be best to contact the provider that ran this sequencing and ask them to help you demultiplex. Or at least get a fresh copy of the data in case you have a corrupt data folder/files.

ADD REPLY
0
Entering edit mode

I have tried doing the same thing on the LINUX virtual environment:
bcl2fastq

bcl2fastq --no-lane-splitting -R /home/linas/Desktop/210924_NB551189_0088_AHJ53MAFX2/ -o /home/linas/Desktop/210924_NB551189_0088_AHJ53MAFX2/structured --sample-sheet /home/linas/Desktop/210924_NB551189_0088_AHJ53MAFX2/SampleSheet.csv

However, now I have different error: un

[26be880] ERROR: bcl2fastq::common::Exception: 2021-Oct-15 11:13:56: No such file or directory (2): /TeamCityBuildAgent/work/556afd631a5b66d8/src/cxx/lib/layout/FileExistenceVerifier.cpp(212): Throw in function static void bcl2fastq::layout::FileExistenceVerifier::throwException(const string&, bcl2fastq::common::TileAggregationMode, bcl2fastq::common::LaneNumber, bcl2fastq::common::TileNumber) Dynamic exception type: boost::exception_detail::clone_implbcl2fastq::common::IoError std::exception::what: Unable to find positions file for lane: 1

ADD REPLY
0
Entering edit mode

Your data folder is likely missing files or is somehow corrupted. Did you get a new copy from the provider?

ADD REPLY
0
Entering edit mode

Don't install anything, this is almost certainly a problem of not having the full run folder.

ADD REPLY

Login before adding your answer.

Traffic: 1567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6