Hi all, another noob question from me.
I finally finished that FASTA Splitter from my last post, which I will update soon. My new task is to Chunk a Fasta File into well... chunks still in Python too.
FileInput = input('What is full address of the file you need to work with? ' )
FileOutPut = input('What is the address of the Directory you want saving to?')
OrganismOI = input('What organism are you looking at? ')
ChunkSize = int(input("What is the chunk size (in kb) you want? "))
Chunk = ChunkSize*1000
Counter = 0
FileName = FileOutPut + '/' + OrganismOI + str(Counter) + '.fa''
FILE = open(FileInput, 'r')
File = FILE.read()
FileSize = len(File)
print(str(FileSize) + ' is the length of the file in bp.')
#This also includes Fasta TitleNumber
print('Final Number of Files to be expected is ' + str(FileSize / Chunk) + '.')
for chunk in File:
Save = open(FileName, 'w')
Save.write(File[:Chunk])
Save.close()
For the life of me, as soon as it comes to loops, I die inside. No matter how much I read or practise it just isn't going in. This script is supposed to go through the original FASTA file, and every chunk (ChunkSize, being user defined) spit out a file containing that chunk and called OrganismOI + counter (Supposed to count file numbers). Once again and always, help is very much appreciated. If I can then add a co-ordinate system ill be set for not. If anyone has any books or materials that could feasibly help me out learning this stuff then please send it my why. Would the BioStar handbook be a sensible purchase?
Again thanks everyone.
Can you clarify, is this meant to segregate whole sequences into numbered chunks (e.g. a file with 30 sequences, splits those sequences in to 3 files of 10 sequences), or is the intention to chop up the seqeunces themselves in to windows (e.g. a file with 30 sequences, split in to 30 files, each containing one sequence, broken into windows of length n)?
To you last point, really any number of books and websites exist to help learn, but I am personally an advocate of just learning-by-doing. For me at least, its the fastest way. Write code. Write lots. Write bad code. Pick up new and better habits from code you find online. Make the bad code a bit less bad. Rinse and repeat.
Of course, I have the whole genome for Salmonella CT18 in a basic non annotated nucleotide FASTA format. Say I want to break it into chunks of 50kb to ease processing downstream. I think later on, it will be use full to break it into chunks with a number of gene sequences but not yet.
And thanks, that's what I'm trying to do but i'm clearly going wrong somewhere.
Not directly since BioStar handbook does not cover python programming. If you do purchase the book you get access to "Python programming in 100 hours", a separate series of video lectures that Prof. Istvan Alberts is including with the book. Unfortunately he has not completed that series yet since there are only 4 videos currently available.