Using STAR mapping multiple files get loop issue
1
0
Entering edit mode
5.5 years ago
lingziqi8278 ▴ 10

Hello guys, Recently, I using STAR to map reads with multiple files ,here is the script:

 for NAME in individual1 individual2  individual3
do
     STAR --runMode alignReads \
              --runThreadN 10 \
              --genomeDir $REF \
              --readFilesIn ${INPUT}/${NAME}_input/${NAME}_input_R1.fq.gz \
              --readFilesCommand zcat \
              --outSAMstrandField intronMotif  \
              --outFileNamePrefix ${OUT}/${NAME}_input_wasp \
              --outSAMtype BAM Unsorted \
              --varVCFfile ${OUTPUT}/${NAME}_input.vcf \
              --waspOutputMode SAMtag \
              --outSAMattributes vA vG
 done

PATH is right for sure . The key problem is when it get one file done, it stop. NO warning at all. When I type "ps" , is shows like this.

  PID TTY          TIME CMD
19335 pts/0    00:00:00 bash
19384 pts/0    00:00:00 bash
19665 pts/0    00:36:35 STAR
19668 pts/0    00:00:00 sh <defunct>
19708 pts/0    00:00:00 ps

Only when I type ''kill 19665 '' , the next file can be processed . I have no idea about this issue, this confuse me a lot . Could anyone tell me how to fix it? THANK YOU !

RNA-Seq ChIP-Seq gene software error • 4.2k views
ADD COMMENT
0
Entering edit mode

See my suggestion for a simple parallelization script (for bowtie2 but I think you'll get the idea) A: perl script for BWA-mem on multiple different files

ADD REPLY
0
Entering edit mode

Thanks ! It seem useful , I will try in my code .

ADD REPLY
0
Entering edit mode

do ${OUT}/ and ${OUTPUT}/ exist before you run STAR?

How do you define $OUT and $OUTPUT?

ADD REPLY
0
Entering edit mode

It just like this

OUTPUT=/safedisk/CHIP_Seq/PhaseI/5_platypus_vcf
OUT=/safedisk/CHIP_Seq/PhaseI/6_wasp_bam

These two directory represent results of two different step ,${OUT} is where I store my STAR result. By the way ,I test STAR with one single file, "defunct"still happen.

ADD REPLY
0
Entering edit mode
5.5 years ago
caggtaagtat ★ 1.9k

Hi,

I also execute STAR in a loop and use two differnt ways to get the file names. Either I submit the file names (with the respective paths) to STAR by a document which holds a filename per line:

# For every name in the file
while read SAMPLE; do

# Get single file name
FILEBASE=$(basename "${SAMPLE%.fq.rm_bl}")

# Make new directory for every sample
mkdir /path_to_later/gap_table/$FILEBASE.STAR

# Enter the new directory
cd /path_to_later/gap_table/$FILEBASE.STAR

# Align with STAR 
/path_to_STAR/STAR --outFilterType BySJout --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.04 --alignEndsType EndToEnd --runThreadN 8 --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 4 --alignIntronMax 300000 --alignSJoverhangMin 8 --alignIntronMin 20 --genomeDir /path_to/star_index_hg38_hiv_r100/ --sjdbOverhang 100 --quantMode GeneCounts --sjdbGTFfile/path_to/hg38_pnL43_fusion_annotation.gtf --outFileNamePrefix /path_to/gap_table/$FILEBASE.STAR/ --readFilesIn $SAMPLE > STARaligning.log 

done </path_to_filename_file/filename

Another way would be to search within a directory for certain filenames, to use them subsequently in STAR as input:

Here the first row of the code above is replaced with this 2 lines:

# For every file in the given directory (/path_to_file/), use the filenames showing a ".fq" at the end
find /path_to_files/ -name "*.fq" | while read SAMPLE

# Get single file name
FILEBASE=$(basename "${SAMPLE%.fq}")

I suppose, the extra space between individual2 individual3 is not in the real code? Otherwise, I don't know the reason for the error during your particular kind of loop.

ADD COMMENT
0
Entering edit mode

Thanks a lot for answering ! There is no extra space between sample name in real code .I test STAR with single file , I type "ps" ,it look like this :

PID TTY          TIME CMD
29037 pts/1    00:00:00 bash
29088 pts/1    00:00:00 ps

Looking like normal, however, when I type "ps -ef | grep usr_name" .it shows :

28999 28935  0 18:33 pts/0    00:00:00 bash 1_STAR_test.sh
29007 28999 99 18:33 pts/0    00:23:18 STAR --runMode alignReads --runThreadN 10 --genomeDir /home/zhuyl/Genome/susScr11_STAR_update --readFilesIn /safedisk2/lingziqi/phaseI/2019-5-13-36individual/BMX4_Liver_input/BMX4_Liver_input_R1.fq.gz --readFilesCommand zcat --outSAMstrandField intronMotif --outFileNamePrefix /safedisk/09_Encode/CHIP_Seq/PhaseI/BWA_bam/2019-5-13-36individual_lingziqi/6_wasp_bam/BMX4_Liver_input_wasp --outSAMtype BAM Unsorted --varVCFfile /safedisk/09_Encode/CHIP_Seq/PhaseI/BWA_bam/2019-5-13-36individual_lingziqi/platypus_vcf/BMX4_Liver_input.vcf --waspOutputMode SAMtag --outSAMattributes vA vG
29010 29007  0 18:33 pts/0    00:00:00 [sh] <defunct>`

I guess maybe it is not about loop , it just STAR can't exit normally when it get job done ? Have you ever met this issue before ?

ADD REPLY
0
Entering edit mode

No sry never. Are you sure, you provided the 30GB RAM you need for aligning with STAR?

ADD REPLY
1
Entering edit mode

Hi caggtaagtat ,I am happy to tell you that I have known what's going on . It is RAM issue . The problem happened because I set 10 threads. Maybe it is too large to account . Hope this can help other people who encounter same problem like me .

ADD REPLY
0
Entering edit mode

yes, total RAM is 60GB . Anyway, Thanks for helping me . ^o^

ADD REPLY

Login before adding your answer.

Traffic: 1715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6