Merge fastq file from same lane and start bowtie alignment
1
1
Entering edit mode
7.7 years ago

Hi all,

I am trying to do a couple of operations in automatic. Basically I have a directory of fastq PE files from the same sample e.g.:

L16-24MG-A_S14_L001_R1_001.fastq.gz  L16-24MG-A_S14_L003_R1_001.fastq.gz
L16-24MG-A_S14_L002_R1_001.fastq.gz  L16-24MG-A_S14_L004_R1_001.fastq.gz

L16-24MG-A_S14_L001_R2_001.fastq.gz  L16-24MG-A_S14_L003_R2_001.fastq.gz
L16-24MG-A_S14_L002_R2_001.fastq.gz  L16-24MG-A_S14_L004_R2_001.fastq.gz

I am trying to cat the fastq file together and then run bowtie so at the and having only one .bam file. This is my script so far but it's not quite working. In fact I am not even able to obtain a combined fastq file before bowtie. Can you help me out please?

for i in $(ls *R1*.gz) do cat *R1* > ${i%.R1_combined.fastq}.gz  done

for i in $(ls *R2*.gz) do cat *R2* > ${i%.R2_combined.fastq}.gz  done

gunzip *.gz

for i in $(ls *.fastq | rev | cut -c 13- | rev | uniq)

do

bowtie  /home/casaburi/ufrc/hybrid_pacbio_global/rsem/final_assembly_cdhit100 \
-1 ${i}_R1_combined.fastq -2 ${i}_R2_combined.fastq \
--all --best --strata -m 300 --chunkmbs 512 -S -p 10 | samtools view -F 4 -S -b -o ${i}.bam

done
bowtie RNA-Seq • 3.2k views
ADD COMMENT
0
Entering edit mode

I have formatted your code correctly. In future use the icon shown below (after highlighting the text you want to format as code) when editing.

ScreenCap

ADD REPLY
0
Entering edit mode

You are not using ; to terminate your shell script statements for one.

for i in $(ls *R1*.gz); do cat *R1* > ${i%.R1_combined.fastq}.gz; done
ADD REPLY
0
Entering edit mode

Thanks genomax, and sorry for the missing format. I am still having the issue of not being able to see a concatenated .gz file. I rather have this:

L16-24MG-A_S14_L001_R1_001.fastq.gz L16-24MG-A_S14_L002_R1_001.fastq.gz.gz  L16-24MG-A_S14_L004_R1_001.fastq.gz
L16-24MG-A_S14_L001_R1_001.fastq.gz.gz  L16-24MG-A_S14_L003_R1_001.fastq.gz L16-24MG-A_S14_L004_R1_001.fastq.gz.gz
L16-24MG-A_S14_L002_R1_001.fastq.gz L16-24MG-A_S14_L003_R1_001.fastq.gz.gz

Which is not what I am looking for. I am looking to have only this:

L16-24MG_R1_combined.fastq.gz
ADD REPLY
1
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

Your original files will not disappear since you are cating them to make the large file. So you should expect the combined file in addition to the originals.

You could align the four pieces in parallel and then merge the BAM files afterwards.

ADD REPLY
0
Entering edit mode

I know that the original files will still be there, but the combined file (which is the all point of this post) is not appearing at this stage.

ADD REPLY
0
Entering edit mode

A simple cat L16-24MG-A_S14*R1* > L16-24MG-A_S14.combined.fastq.gz should be sufficient for that purpose (if that is all the files you have).

ADD REPLY
0
Entering edit mode

Right, but I have multiple files in different folders. So I was planning to just run the same script in every folder. That's why I was looking for something that could also write the output as i% based on the input name, otherwise I have to manually edit every time the script according to input.

ADD REPLY
0
Entering edit mode

No one can answer this?

ADD REPLY
2
Entering edit mode
7.7 years ago
GenoMax 148k

Try: for i in $(ls -1 L16*R1*); do cat ${i%%_R1*}_R1_001.fastq.gz >> ${i%%_S14*}_combined.fastq.gz; done

ADD COMMENT
0
Entering edit mode

Not working, thank you though!

ADD REPLY
0
Entering edit mode

What is not working? Are you getting an error?

ADD REPLY
0
Entering edit mode

Nevermind it's actually working, my bad! Thank you so much for you help!

ADD REPLY
1
Entering edit mode

Thanks for confirming. "Accept" this answer (green check mark) to provide closure to the thread.

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6