Hi all,
I've downloaded a Python script from the Supplementary data from a paper "Genome-wide copy number analysis of single cells', from Nature protocols 2012, Baslan et al. They provide several scripts to use to make your own pipeline to analyse copy numbers from sequencing data. I've ran some scripts from their supplementary files already, and these don't give any problems. However, I have some problems when I try to generate synthetic 50bp reads for each position in the hg19 genome. When I try to run this script 'hg19.generate.reads.k50.py' I don't get an output. Actually, I do get an output: one empty file, instead of several files. I have adapted the filepaths to correctly represent the paths for my case (as I did in the other scripts I ran). I don't know if I need to change some other parameters as well? (I'm not a pro with Python). I've copied the original script below, any help on how to run this would be very much appreciated.
#!/usr/bin/env python
import re
import sys
import random
import string
def main():
CHROMLIST = open("/filepath/chromlist.txt", "r")
chromlist = []
for x in CHROMLIST:
if x.rstrip().find("_") == -1:
chromlist.append(x.rstrip())
INFILE = False
OUTFILE = open("/filepath/sequence.part.0.k50.txt", "w")
outfilecount = 0
counter = 0
for chrFile in chromlist:
a = chrFile.split("/")
b = a[len(a) - 1]
thisChrom = b.split(".")[0]
if INFILE:
INFILE.close()
INFILE = open(chrFile, "r")
chr = []
x = ""
y = ""
INFILE.readline()
for x in INFILE:
chr.append(x.rstrip())
x = "".join(chr)
y = x.upper()
print "after read " + thisChrom
sys.stdout.flush()
for i in range(len(y) - 50):
counter += 1
thisRead = y[i:(i+50)]
if counter % 150000000 == 0:
OUTFILE.close()
outfilecount += 1
OUTFILE = open("/filepath/sequence.part." + str(outfilecount) + ".k50.txt", "w")
OUTFILE.write("@" + thisChrom + "." + str(i))
OUTFILE.write("\n")
OUTFILE.write(thisRead)
OUTFILE.write("\n")
OUTFILE.write("+" + thisChrom + "." + str(i))
OUTFILE.write("\n")
OUTFILE.write("b" * len(thisRead))
OUTFILE.write("\n")
INFILE.close()
OUTFILE.close()
if __name__ == "__main__":
main()
I've corrected the formatting as much as I could without actually going through the code. In general, it's often simpler to just link to a gist.
I agree to an extent. Defeats our markdown move though :(