Bcl to fastq Conversion Problem
2
0
Entering edit mode
9.2 years ago
BioRyder ▴ 220

Hello,

Below are the summary of Bcl to fastq conversion by using bcl2fastq v2.16.0.10 .In Output folder only lane 4 and 5 are having R1 and R2 reads (18 GB ) and remaining all lanes (1,2,3,6,7,8) are having R1 and R2 reads with zero size. If We are looking into lane summary ,last four columns are showing there were data generated for all lanes. Can any one tell me why the data are missing and reports showing data generated... ?

Clusters (Raw)    Clusters(PF)     Yield (MBases)
3,861,446,400     2,333,759,444    704,795

Flowcell Summary

Lane Summary

Lane       Raw data        Filtered data

#          Clusters        % of the         % Perfect    % One mismatch    Clusters        Yield       % PF       % >= Q30      Mean Quality
                           lane             barcode      barcode                           (Mbases)    Clusters   bases         Score

1          482,680,800     100.00           0.00         0.00              315,981,885     95,427      65.46      87.52         37.35
2          482,680,800     100.00           0.00         0.00              294,018,812     88,794      60.91      87.41         37.31
3          482,680,800     100.00           0.00         0.00              306,362,193     92,521      63.47      86.42         37.07
4          482,680,800     100.00           0.00         0.00              312,679,038     94,429      64.78      87.17         37.22
5          482,680,800     100.00           0.00         0.00              269,671,083     81,441      55.87      84.35         36.42
6          482,680,800     100.00           0.00         0.00              302,801,644     91,446      62.73      84.90         36.60
7          482,680,800     100.00           0.00         0.00              218,432,818     65,967      45.25      77.27         34.33
8          482,680,800     100.00           0.00         0.00              313,811,971     94,771      65.01      86.58         37.12
software-error • 4.9k views
ADD COMMENT
0
Entering edit mode

Is everything ending up in the Undetermined_*.fastq.gz files? That would explain the results. This would indicate that your sample sheet is incorrect.

ADD REPLY
0
Entering edit mode

Hello Devon Ryan,

There is no Undetermined_*.fastq.gz files in out put directory , Because I have not mentioned any index in Sampleshee.csv to convert bcl to fastq.

Below is the Index file.

[Header],,,,,,,,
IEMFileVersion,4,,,,,,,
Date,1/12/15,,,,,,,
Workflow,GenerateFASTQ,,,,,,,
Application,HISeq FASTQ Only,,,,,,,
Assay,,,,,,,,
Description,,,,,,,,
Chemistry,Default,,,,,,,
,,,,,,,,
[Reads],,,,,,,,
151,,,,,,,,
151,,,,,,,,
,,,,,,,,
[Settings],,,,,,,,
ReverseComplement,0,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,
,,,,,,,,
[Data],,,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
1,PhiX_Sample_Stock,PhiX_Sample_Stock,,,,,PhiX,
2,Spix_44G,Spix_44G,,,,,Spix_Macaw,
3,Spix_45G,Spix_45G,,,,,Spix_Macaw,
4,PhiX_Sample_Stock,PhiX_Sample_Stock,,,,,PhiX,
5,Spix_73G,Spix_73G,,,,,Spix_Macaw,
6,Spix_74G,Spix_74G,,,,,Spix_Macaw,
7,Spix_95G,Spix_95G,,,,,Spix_Macaw,
8,Spix_109G,Spix_109G,,,,,Spix_Macaw,
ADD REPLY
0
Entering edit mode

Presumably it's hitting an error, check the log file.

ADD REPLY
0
Entering edit mode

We haven't switched to v2, but v1 would exhibit this behavior if there were missing .bcl or .stats files. You can add the relevant flags (in v1, I believe it's --ignore-missing-stats --ignore-missing-bcl) and try again.

ADD REPLY
3
Entering edit mode
9.1 years ago
BioRyder ▴ 220

Hi All,

We have identified the problem.The above mentioned problem is happened due to File format of linux Server. Bcl2fastq Version2 is working properly in XFS file system. But If we are using gpfs file system in Linux server,bcl2fastq V2 is generating partial out put file, missing R1 or both R1 and R2 . We have contacted illumina and informed the same. They are internally checking the issue of Bcl2fast with gpfs file system.

ADD COMMENT
0
Entering edit mode

Wow, that's kind of crazy. Thanks for reporting back!

ADD REPLY
0
Entering edit mode

What kind of hardware are you using gpfs on? Is this on a cluster that uses a job scheduler?

ADD REPLY
0
Entering edit mode

It is on a cluster that uses SLURM scheduler.

ADD REPLY
0
Entering edit mode

GPFS is not our favorite either but we have not had gross issues like missing files in bulk. It sounds like bcl2fastq/SLURM both think that the jobs are completing properly but the files are missing on storage. Is the storage hardware fully patched/has latest firmware? Have you noticed this problem with other software/processes?

ADD REPLY
0
Entering edit mode
9.1 years ago
BioRyder ▴ 220

Hello All,

Below is the reply from illumina for the above mentioned GPFS file problem.Hope it will helpful for others

As far as I understand GPFS, while supported by CentOS, is not a default FS and the OS needs to be reconfigured to use it. Illumina development teams use CentOS for the development and validation of our Linux based software, though only with standard settings. I will however make some internal enquiries to seen whether the GPFS files system has been tested. Having made these enquiries I can confirm that we have had reports that GPFS does not handle BCL2FASTQ processes very well. This is due the sheer number of small files that need to be loaded and processed.

ADD COMMENT

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6