How can I use python to list the filenames of all FASTQ files? Use os.listdir()? But how to specify on FASTQ file?
Also, after this I want to do some further analysis on these files; eg. zcat filenames.recal.fastq.gz |wc
How can I do such things using Python?
THanks!
Edit: I'm writing python myself. My python script goes like:
#!/usr/bin/python
import os, sys,re,gzip
path = "/home/xxxx/Downloads"
for file in os.listdir(path):
if re.match('.*\.recal.fastq.gz', file):
text = gzip.open(file,'r').read()
word_list = text.split()
number = word_list.count('J') + 1
if number != 0:
print file
searching fastq.gz goes well, but problems are:
Traceback (most recent call last):
File "try.py", line 9, in <module>
text = gzip.open(file,'r').read()
File "/usr/lib/python2.7/gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "/usr/lib/python2.7/gzip.py", line 89, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: 'ERR001274_1.recal.fastq.gz'
I think there's sth wrong with the gzip, and also why can't I open ERR001274, it DOES exist ......any ideas? thx!
try glob. import glob; print glob.glob("*.fastq"). you might clarify your question, why not just do it on the command-line?
command-line alternative: http://www.infoanda.com/resources/find.htm
Your Python code is not showing up correctly formatted, please indent it with four spaces. See: http://meta.stackoverflow.com/questions/22186/how-do-i-format-my-code-blocks
You need to provide the full path to the file line line 3 of your loop - the
file
in your first loop is just a string of the file name within the Downloads folder.