Question

Computing chunk of fasta file before moving to next chunks using python

0

Entering edit mode

7.3 years ago

ToastedGoat ▴ 10

I have a fasta file with 100 random permutations of a genome. It is then broken up into ORFs. Here is a sample:

>Shuffled1_1 [5 - 79] Based on NC_000908.2
>MNGLISMLIYFDEYENEIFVEKKDL
>Shuffled1_2 [36 - 101] Based on NC_000908.2
>MMSMKMKFSLKKKICNYIPKLQ
>Shuffled1_3 [19 - 111] Based on NC_000908.2
>MNVNLLWWVWKWNFRWKKRFVTTYQSSNKLK
>Shuffled1_4 [105 - 134] Based on NC_000908.2
>MKVERQISTL
>Shuffled1_5 [148 - 177] Based on NC_000908.2
>MINNSSSPAL
>Shuffled1_6 [174 - 242] Based on NC_000908.2
>MINKFPELLLTQLSYVRYYLFAI

There are over 3 million records. I need to calculate the 95th percentile by averaging the estimates of each shuffled permutation. I am having trouble figuring out how to get the 95th percentile of shuffled1 before moving onto shuffled2 and so on. Any help would be appreciated. I was able to pull out just the first shuffled permutation using awk and get the percentile using this code:

from Bio import SeqIO
import numpy as np

seqLen=[]

fasta_sequence = SeqIO.parse(open("firstperm.fasta"),"fasta")

for fasta in fasta_sequence:
    name,sequence = fasta.id, str(fasta.seq)
    seqLen.append(len(fasta))

a = np.array(seqLen)
p = np.percentile(a,95)
print p

I am just trying to figure out a way to duplicate this over the entire fasta file for all 100 shuffled permutations. Basically I am looking for 100 numbers in a list for each 95th percentile.

python fasta • 1.6k views

ADD COMMENT • link updated 7.3 years ago by george.ry ★ 1.2k • written 7.3 years ago by ToastedGoat ▴ 10

score 1 · Accepted Answer · 2017-10-04

1

Entering edit mode

7.3 years ago

george.ry ★ 1.2k

from collections import defaultdict as dd
from Bio import SeqIO
import numpy as np

shuffles = dd(list)

for record in SeqIO.parse(open('firstperm.fasta'), 'fasta'):
    id = record.id.split('_')[0]
    shuffles[id].append(len(record))

for k, v in shuffles.items():
    print(k, np.percentile(v, 95))

Something like that? (Not tested ... on the wrong machine)

ADD COMMENT • link 7.3 years ago by george.ry ★ 1.2k

1

Entering edit mode

Thanks this works! Appreciate it!

ADD REPLY • link 7.3 years ago by ToastedGoat ▴ 10