bwa for multiple fastq files
1
0
Entering edit mode
6.8 years ago
dmotelb87 ▴ 20

I would like to do bwa annotation for forward and reverse reads from different samples against a reference genome using for loap

I have different samples like:

A2_forward.fastq A2_reverse.fastq A3_forward.fastq A3_reverse.fastq A4_forward.fastq A4_reverse.fastq

Also, I need the output for annotation of each sample in separate, as I need to compare the effects of different treatments in different samples.

bwa aligner • 12k views
ADD COMMENT
2
Entering edit mode

Thanks Suraj. Yes. This what I mean. It is working now.

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY
9
Entering edit mode
6.8 years ago
MSM55 ▴ 160

Do you men to say, you want to do mapping of different sample against your reference genome ? You can use following script for mapping multiple sample using bwa. (Note: save the below script in .sh format (say "mapping.sh") and make sure that are no other fastq file rather then your sample should be there in your working directory)

script for mapping using bwa mem

total_files=`find -name '*.fastq' | wc -l`
arr=( $(ls *.fastq) )
echo "mapping started" >> map.log
echo "---------------" >> map.log

for ((i=0; i<$total_files; i+=2))
{
sample_name=`echo ${arr[$i]} | awk -F "_" '{print $1}'`
echo "[mapping running for] $sample_name"
printf "\n"
echo "bwa mem -t 12 A1_denovo_assembly.fasta ${arr[$i]} ${arr[$i+1]} > $sample_name.sam" >> map.log
bwa mem -t 12 A1_denovo_assembly.fasta ${arr[$i]} ${arr[$i+1]} > $sample_name.sam
}
ADD COMMENT
0
Entering edit mode

Some elegant bash / awk here. +1

ADD REPLY
0
Entering edit mode

I am very new to bash but think this code is exactly what I need (same problem). Would it be possible to step through what the for loop is doing? Thank you

ADD REPLY
0
Entering edit mode

Follow along using the numbers below as line numbers to the script above.

  1. Counting number of files that end in *.fastq. If your files have different endings then substitute accordingly.
  2. Assigning the names of the files to an array called arr.
  3. Just printing a message to screen and append to log file called map.log.
  4. Just printing a message to screen and append to log file called map.log.
  5. A for loop to step through the array taking two file names at a time (this is a paired-end dataset so forward and reverse pair, will need to change depending on your use case) until you take all elements from array. Array elements start at position [0].
  6. sample_name extracts the actual name of the sample (stuff before first _ e.g. Sample1 from Sample1_fastq)
  7. Printing the name you are working with to screen with echo
  8. Printing the actula command you are working with. Name of output file created using $sample_name as $sample_name.sam.
  9. Run the actual bwa mem command.
ADD REPLY
0
Entering edit mode

How to call input files from a folder with paired end input files for this array?

ADD REPLY
0
Entering edit mode

Changing the first two lines to may be all you need:

total_files=`find -name 'folder_name/*.fastq' | wc -l`
arr=( $(ls folder_name/*.fastq) )
ADD REPLY
0
Entering edit mode

Thanks! But it gives error as Unix filenames usually don't contain slashes

ADD REPLY
0
Entering edit mode

I tried this

total_files=`find -wholename './sample_data/*.fastq' | wc -l`
arr=( $(ls ./sample_data/*.fastq) )

But then bwa mem runs for two pairs of input files in the folder but output only one file named sample.bam However, I am expecting two output files, A1.bam and B1.bam

ADD REPLY
0
Entering edit mode

I would try replacing this line instead:

total_files=`ls /sample_data/*.fastq | wc -l`

Keep an eye on

echo "[mapping running for] $sample_name"

to make sure sample names are being printed properly. You should get as many BAM files as there are pairs of sample files.

ADD REPLY
0
Entering edit mode

So I tried

total_files=`ls ./sample_data/*.fastq | wc -l`
arr=( $(ls ./sample_data/*.fastq) )
for ((i=0; i<$total_files; i+=2))
{
sample_name=`echo ${arr[$i]} | awk -F "_" '{print $1}'`
echo "[mapping running for] $sample_name"
printf "\n"
bwa mem -t 12 ref.fa ${arr[$i]} ${arr[$i+1]} | samtools sort -o output/$sample_name.sorted.bam

Echo says

[mapping running for] ./sample

And got one sorted.bam file not with the name of input file but as sample.sorted.bam

It is reading sample as a file name which is a problem.

ADD REPLY
0
Entering edit mode

It looks like your samples names don't follow the format (Sample_R1.fastq) expected above. What do they look like?

ADD REPLY
0
Entering edit mode

My sample names are different

1_R1.fastq 1_R2.fastq 2_R1.fastq 2_R2.fastq

But the same script runs if the same samples are in the directory itself rather than a folder in the same directory. But I want to have my input files in the folder and it does not work.

ADD REPLY
0
Entering edit mode

Your script is missing a closing curly bracket (brace) after the final command

ADD REPLY

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6