This question may be better suited for another forum, in which case I am super sorry! I am new to bioinformatics, but I could really use some help in this moment!!
I have struggled to write this for loop to iterate through two text files, I made both of the. The first text file "new-bins_list.txt" is just a list of all of the bins that need to be be used as the ref file. the second text file I would like to iterate through is "new_names_metalist" is the first part of the file names in "new-bins_list.txt". This file is the identifier of the trimmed reads file that I would like to recruit to. Here is the first portion of each of those files, to help me illustrate what i am describing:
$ head new-bins_list.txt
SRR4101185.maxbin.008.fasta
SRR1633224.maxbin.021.fasta
SRR1986369.maxbin.004.fasta
SRR1971621.maxbin.012.fasta
SRR2058405.maxbin.006.fasta
SRR1636509.maxbin.009.fasta
SRR1636517.maxbin.016.fasta
SRR4048936.maxbin.001.fasta
SRR4101185.maxbin.041.fasta
SRR1995427.maxbin.002.fasta
$ head new_names_metalist
SRR4101185
SRR1633224
SRR1986369
SRR1971621
SRR2058405
SRR1636509
SRR1636517
SRR4048936
SRR4101185
SRR1995427
This is the for loop that I have tried to create to iterate through each file and coinciding trimmed read file. I recognize that I could use a shell script, but we have several hundred which could become extremely tedious to write a shell script for. The for loop works- the issue is that it just keeps iterating though the exact same bin file, SRR4101185.maxbin.008.fasta and just compares it to the different metagenome read files that are listed in the new_names_metalist. while I am excited to get any thing from this loop, I wish it would actually work! Any suggestions that you could share with me would be greatly appreciated.
for bin in $(cat new-bins_list.txt); do for read in $(cat new_names_metalist); do bbmap.sh ref=/storage/sequence_samples/Crump_lab_CoDL/ALL_MAGs/${bin} nodisk in=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/${read}_1.cleaned.fq.gz in2=/storage/sequence_samples/Crump_lab_CoDL/SRA_trim_out/${read}_2.cleaned.fq.gz covstats=coverage_stats_SRR-SRR/${bin}.vs.SELF ; done ; done
Please let me know if I need to make any clarifications to this post! Thank you so much for taking the time to read this!
I think the loop is working fine. Are you sure it is not? Following should produce all the command lines. You can check and verify then remove
echo
to run.Here is an example of what I get
I am not sure what I am doing incorrectly, the for loop works, I previously though it was "stuck" on the first bin, SRR4101185.maxbin.008.fasta. It turns out, I think it might be iterating each bin through every single ${read} file (about 300 of them), which is not what I want either! It just keeps overwriting the output that is named after ${bin} . Ideally, I would like each bin to be recruited to its coinciding set of reads only once. This could be worked around if I could figure out how to only use the first text file, new-bins_list.txt and only use the first part of the file as the identifier for the ${read} but I am not adept enough with these tools to accomplish this.Dang computers doing exactly what I tell them to!
People generally never accept that :-)
Someone may help in the meantime otherwise I will look at this again in a bit. Keep hacking. Use
echo
until you are sure output looks right.Thanks for that suggestion. That made me realize what the real issue was at least!