Hi! I want to map Illumina pair-end reads against a reference genome. I have a directory in which I only need to use the files that end with paired_R1.fastq.gz and paired_R2.fastq.gz for the paired reads. I am creating a script in which the paired_R1 are the keys and the paired_R2 are the values; however, I am having difficulties in assigning the keys and values in a for loop. I understand the file1 and file2 are not defined but I don't know how to set the output of "endswith" to a key and value respectively.
if __name__=='__main__':
path = os.getcwd()
dir_files = os.listdir(path)
pair_reads = {}
for file in dir_files:
if file.endswith("_paired_R1.fastq.gz"):
file = file1
if file.endswith("_paired_R2.fastq.gz"):
file = file2
pair_reads[file1] = file2
print(pair_reads)
Thank you in advance!
What is the expected output? I am sure this can be done with a one-liner via the command line.
I will use Tepid which is going to map the paired reads against the reference genome. But the command for TEPID is
tepid-map -1 SRR4209894_paired_R1.fastq.gz -2 SRR4209894_paired_R2.fastq.gz -n SRR4209894 -x /../S288C/S288C -y /../S288C/S288C_reference_sequence_R64-2-1_20150113.X15_01_65525S -p 36 -s 350
. For this reason, I need to assign the paired reads from my directory.Use a simple bash script.
See here to understand how the
${}
parameter expansions work.The problem is that instead of a variable, I want to assign keys and values to files in a directory.