"Hello, I am a student who recently started studying bioinformatics. Since my understanding is still limited, I would appreciate it if you could explain even if the difficulty of the question is low. I am currently working with RNA-seq data and I am facing batch effects that are not reduced even with the Combat method using different pipeline and workflow. Therefore, I would like to standardize the analysis using the workflow available on the GDC portal. The code is provided on the website https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/.
I already downloaded reference sequence files (GRCh.38.d1.vd1.fa.tar.gz) and annotation files (gencode.v36.annotation.gtf.gz) on the website (https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files).
### Step 1: Building the STAR index.
apps/STAR \
--runMode genomeGenerate \
--genomeDir STAR_genomeGenerate \
--genomeFastaFiles GRCh.38.d1.vd1.fa \
--sjdbOverhang 100 \
--sjdbGTFfile gencode.v36.annotation.gtf \
--runThreadN 8
It makes STAR_genomeGenerate/ and GenomeDir.
###Step :2 Alignment 1st Pass.
--genomeDir STAR_genomeGenerate \
--readFilesIn a_1.fastq.gz b_1.fastq.gz c_1.fastq.gz a_2.fastq.gz b_2.fastq.gz c_2.fastq.gz \
--runThreadN 8 \
--outFilterMultimapScoreRange 1 \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 10 \
--alignIntronMax 500000 \
--alignMatesGapMax 1000000 \
--sjdbScore 2 \
--alignSJDBoverhangMin 1 \
--genomeLoad NoSharedMemory \
--readFilesCommand zcat \
--outFilterMatchNminOverLread 0.33 \
--outFilterScoreMinOverLread 0.33 \
--sjdbOverhang 100 \
--outSAMstrandField intronMotif \
--outSAMtype None \
--outSAMmode None
However, when I tried to input multiple fastq.gz files in the same way as the above code (--readFilesIn), I encountered the following error (Segmentation fault (core dumped), so I had to input them one by one. It gives SJ.out.tab, Log.out, Log.progress.out, and Log.final.out. In next step, SJ.out.tab is used for input.
However, as you may know, when I repeat Step 2, a new SJ.out.tab file is generated, and the previous SJ.out.tab file disappears. Then, in the next step, Step 3, there is an intermediate index generation step, but I'm uncertain about how to incorporate the SJ.out.tab file.
I would greatly appreciate it if you could provide an explanation for the issue in question.
Hi there, the
Segmentation fault (core dumped)
seems to be related to memory issues. I would check thecore dump
that has been created by the software to determine what are the exact errors (maybe you could post the content of it here as well?).are files a,b and c all from the same sample? Or are you trying to align three different samples all together?
a,b, and c are all different samples. I have always used '--readFilesIn a_1.fastq.gz a_2.fastq.gz' format for the --readFilesIn option. However, I noticed on this website that it seems possible to input multiple files at once. So, I attempted to use the above code format.
I don't think it means running three totally different samples at the same time I think it means r1 and r2 of one sample.
What I need to show you?