How to assign keys and values in a directory by using python
1
0
Entering edit mode
5.0 years ago
caro-ca ▴ 20

Hi! I want to map Illumina pair-end reads against a reference genome. I have a directory in which I only need to use the files that end with paired_R1.fastq.gz and paired_R2.fastq.gz for the paired reads. I am creating a script in which the paired_R1 are the keys and the paired_R2 are the values; however, I am having difficulties in assigning the keys and values in a for loop. I understand the file1 and file2 are not defined but I don't know how to set the output of "endswith" to a key and value respectively.

if __name__=='__main__':
    path = os.getcwd()
    dir_files = os.listdir(path)
    pair_reads = {}
    for file in dir_files:
        if file.endswith("_paired_R1.fastq.gz"):
            file = file1
            if file.endswith("_paired_R2.fastq.gz"):
               file = file2
               pair_reads[file1] = file2 
    print(pair_reads)

Thank you in advance!

dictionary python for loop • 1.4k views
ADD COMMENT
0
Entering edit mode

What is the expected output? I am sure this can be done with a one-liner via the command line.

ADD REPLY
0
Entering edit mode

I will use Tepid which is going to map the paired reads against the reference genome. But the command for TEPID is tepid-map -1 SRR4209894_paired_R1.fastq.gz -2 SRR4209894_paired_R2.fastq.gz -n SRR4209894 -x /../S288C/S288C -y /../S288C/S288C_reference_sequence_R64-2-1_20150113.X15_01_65525S -p 36 -s 350. For this reason, I need to assign the paired reads from my directory.

ADD REPLY
2
Entering edit mode

Use a simple bash script.

for r1file in *_R1.fastq.gz
do
    tepid-map -1 ${r1file} -2 ${r1file/_R1/_R2} -n ${r1file%%_*} -x /../S288C/S288C -y /../S288C/S288C_reference_sequence_R64-2-1_20150113.X15_01_65525S -p 36 -s 350
done

See here to understand how the ${} parameter expansions work.

ADD REPLY
1
Entering edit mode
my_key = "hey there"
my_value = "ho there"
my_dict = {}
my_dict[my_key] = my_value
ADD REPLY
0
Entering edit mode

The problem is that instead of a variable, I want to assign keys and values to files in a directory.

ADD REPLY
2
Entering edit mode
5.0 years ago
Brice Sarver ★ 3.8k

There are good suggestions in the comments, but (reading between the lines) I think you're having problems because you're building a dictionary where your key:value pairs are the R1 and R2 reads.

What about storing as a tuple and unpacking? You know what needs to be appended to form the read pairs (i.e., _paired_R1.fastq.gz). Grab the stem, then assign the reads based on that.

import re
results = {}    
dir_files = os.listdir(".")
# modify here as needed - you want to grab the file's stem;
# lots of ways to do this.
# I've inferred here from your code above, but a simple x.split()
# will work depending on your stem.
file_stems = [
re.sub("_paired_R1.fastq.gz", "", x) for x in dir_files
if x.endswith("_paired_R1.fastq.gz")
]
# build a tuple with the R1 and R2 names
for stem in file_stems:
  R1 = stem + "_paired_R1.fastq.gz"
  R2 = stem +  + "_paired_R2.fastq.gz"
  results[stem] = (R1, R2)

The rest is pretty straightforward. You simply iterate across your dictionary, and you'll be able to unpack with R1, R2 = results['key']. This can easily be passed to subprocess.call() or similar.

EDIT: wrapping list comprehension to avoid cutoff.

ADD COMMENT

Login before adding your answer.

Traffic: 2118 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6