Question

human mapping fastq second pass error, how to reuse previously generated files, STAR

0

Entering edit mode

3.7 years ago

vascoambrogi • 0

In my quest to build my own mapping using the Fastq files, read 1 and 2, given to me by the sequencing intermediary I have:

checked for quality with FASTQC, the nucleotide individual reads were of very good quality, but I am still worried about the scars diversity of the quality scores within the files and FASTQC would not report on such figure (I have not seen it, nor I found a FASTQC command to calculate it)
used a reference genome downloaded from the NCBI to construct a new genome index directory. I used the GRCh38.p13. Not aware of the GTF GFF files that apparently are a direct download that might have saved me time (is there a place to have more information on them, notably for human genome mapping?).
I then launched successfully on a 16 CPU Threads machine with 60 GB of ram the mapping using STAR and adding the option "--twopassMode Basic". The first pass was generated after a couple of hours, but the second pass incurred a memory error, here is the exit message:

Aug 15 04:35:56 ..... started sorting BAM Max memory needed for sorting = 4537708471 *EXITING because of FATAL ERROR: number of bytes expected from the BAM bin does not agree with the actual size on disk: Expected bin size=3846976240 ; size on disk=1328263168 ; bin number=47 Aug 15 04:37:12 ...... FATAL ERROR, exiting*

I suppose there was not enough space, is there a way to take back the previously generated files once I have added space with STAR? Avoiding a complete recalculation, I haven't seen anything about it in the documentation.

The command line used for mapping:

sudo nohup STAR 
--runThreadN 16 \
--readFilesIn ~/r1.fastq.gz ~/r2.fastq.gz \
--genomeDir ~/hg38_index \
--outFileNamePrefix polly \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outSAMattributes Standard \
--twopassMode Basic

Wishing you a nice weekend,

alignment RNA-Seq software error sequencing • 671 views

ADD COMMENT • link 3.7 years ago by vascoambrogi • 0

score 0 · Answer 1 · 2020-08-16

Found an old post, for whom not familiar with linux, the number of open files allowed becomes an issue if your system applies a standard limit of 1024 (command "ulimit -n"). In case you cannot change the limit you'll have to joggle between the number of threads and the Bins used to sort the output BAM file.

files_open = Bins * Threads

Voila, to my best knowledge the alignment completed successful, wishing you a good week,