Hello everyone!
So, I am new in python (just did 1 week crash course), and a colleague asked for my help to create a little python3 program that would input a fastq file, a sequence query, a barcode file and an output file, and would find him all the individual barcodes (from the barcode file) that are found with the query sequence in the fastq file.
I wrote this little contraption of mine that almost does the job:
infile = open(sys.argv[1], "r")
vfile = sys.argv[2]
barcode = open(sys.argv[3], "r")
outfile = open(sys.argv[4], "w")
usedbarcodes = []
count = 0
seq = ""
for line in infile:
if count == 0:
Id = line.rstrip()
elif count == 1:
if line.find(vfile) > 0:
for bar in barcode:
print(bar)
if line.find(bar) > 0:
if bar in usedbarcodes:
break
else:
print("Yeah")
seq = line.rstrip()
bar = bar.rstrip()
usedbarcodes.append(bar)
print("Used barcodes until now",usedbarcodes)
break
elif count == 2:
sign = line.rstrip()
elif count == 3:
q = line.rstrip()
if len(seq) > 10:
sequence = [Id,seq,sign,q]
print("\n".join(sequence), file = outfile)
count = 0
seq = ""
if count < 3:
count += 1
infile.close()
barcode.close()
outfile.close()
The problem seems to be that the order of the barcodes matters when trying to find them, which would indicate that the loop does not search for all the barcodes everytime but just the ones that have not been searched for before. I expected it to be restarted everytime, and my question is if anyone could tell me why it does that and how to avoid it.
Sorry again for the probably extremely inefficient and complicated codding. I will gladly accept any contructive criticism on that too =D. Thanks a lot in advance!
NĂ©stor
Have you looked into biopython?
While nestorvazquezb will no doubt benefit from biopython, learning how to do something with just basic syntax in a programming language is an important skill to get under one's belt. This is the right way to learn a new language and I commend nestorvazquezb for that.
I dont really understand how it works =\
The tutorial and cookbook is a great resource.
In a nutshell: