Entries Per File FASTA Splitter, not quite working
1
1
Entering edit mode
5.1 years ago
damonlbp ▴ 20

Hello again everyone,

Once again I'm asking for help, this time I have managed to write a semi-decent script to split a FASTA file into multiple files dependent on a User defined number

Problem is, it is only counting headers, so for example counting up to 100 headers in the below example but not really taking notice of the sequence for the 100th headers sequence. That is pushed into the top of the next file. I've tried counting Name and saving it to a list to then zip with Listy to then be saved to the file but at this point I think I'm trying to edit some thing that's too far gone. Anyone have any ideas?

x = '/Users/me/Desktop/FUGU/Original/TakiFugu.fa'
d = '/Users/me/Desktop/FUGU/'
m = 'FUGU'
f = 100 #Takes about 3 minutes to run


def EnteriesPer(FileInput, FileOutPut, OrganismOI, EntryNo):

    EntryNo += 1
    Name = ""
    Counter = 0
    Listy = []
    FileCounter = 0

    FileCounter = 0
    with open(FileInput, 'r') as Org:
        for line in Org:
            Listy.append(line.strip())
            if line[0] == '>':
                Counter += 1
                if Name == "":
                    if Counter == EntryNo:
                        FileCounter += 1
                        with open(FileOutPut + OrganismOI + str(FileCounter) + '.fasta', 'w') as Oh:
                            Oh.write('\n'.join(Listy).strip())
                            Counter = 0
                            Listy = []
                            Length = 0
                            print('Thats '+ str(FileCounter))

This is currently part of my attempt at a much larger script which will parse files in a pipeline I will eventually set up (a rite of passage before starting work proper I am told), this will involve 5 functions and some arg.parse to call each function when needed so thats fun.

Thanks in advance everyone.

software-error fasta gene python • 804 views
ADD COMMENT
1
Entering edit mode

What you are experiencing is because you increment the Counter after you've seen the header. Then you check if counter is specified value and write the file - while you have NOT yet read the sequence.

You'll need to either set up some kind of buffer for reading one sequence completely (and then add it to the others) or be able to seek to previous line and pop last item from the list, when you are writing the file (the f.tell() and f.seek() might interest you).

I'll leave the implementation up to you.

ADD REPLY
0
Entering edit mode
5.0 years ago
damonlbp ▴ 20

If anyones still watching this, this is my working final product!

  def read_fasta(filetoparse):  # Works as part of entryfunction
    """A function which opens and splits a fasta into name and seq"""
    name, seq = None, []
    for line in filetoparse:
        line = line.rstrip()
        if line.startswith(">"):
            if name:
                yield (name, ''.join(seq))
            name, seq = line, []
        else:
            seq.append(line)
    if name:
        yield (name, ''.join(seq))


def entryfunction(FI, FO, EN=1):  # <- Works
    """The entryfunction function splits a FASTA file into a user defined
    number of entries per file"""
    count = 0
    filecounter = 0
    entry = []

    with open(FI) as filetoparse:
        for name, seq in read_fasta(filetoparse):
            nameseq = name, seq
            entry.append(nameseq)
            count += 1

            if count == EN:
                filecounter += 1

                with open(FO + str(filecounter) + '.fa', 'w') as done:
                    for idss, sequence in entry:
                        done.write(idss + '\n' + sequence + '\n\n')

                    count = 0
                    entry = []

        filecounter += 1
        with open(FO + str(filecounter) + '.fa', 'w') as done:
            for idss, sequence in entry:
                done.write(idss + '\n' + sequence + '\n\n')

            entry = []
            print('Give me a second to load files')
ADD COMMENT

Login before adding your answer.

Traffic: 2147 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6