Computing chunk of fasta file before moving to next chunks using python
1
0
Entering edit mode
7.2 years ago
ToastedGoat ▴ 10

I have a fasta file with 100 random permutations of a genome. It is then broken up into ORFs. Here is a sample:

>Shuffled1_1 [5 - 79] Based on NC_000908.2
>MNGLISMLIYFDEYENEIFVEKKDL
>Shuffled1_2 [36 - 101] Based on NC_000908.2
>MMSMKMKFSLKKKICNYIPKLQ
>Shuffled1_3 [19 - 111] Based on NC_000908.2
>MNVNLLWWVWKWNFRWKKRFVTTYQSSNKLK
>Shuffled1_4 [105 - 134] Based on NC_000908.2
>MKVERQISTL
>Shuffled1_5 [148 - 177] Based on NC_000908.2
>MINNSSSPAL
>Shuffled1_6 [174 - 242] Based on NC_000908.2
>MINKFPELLLTQLSYVRYYLFAI

There are over 3 million records. I need to calculate the 95th percentile by averaging the estimates of each shuffled permutation. I am having trouble figuring out how to get the 95th percentile of shuffled1 before moving onto shuffled2 and so on. Any help would be appreciated. I was able to pull out just the first shuffled permutation using awk and get the percentile using this code:

from Bio import SeqIO
import numpy as np

seqLen=[]

fasta_sequence = SeqIO.parse(open("firstperm.fasta"),"fasta")

for fasta in fasta_sequence:
    name,sequence = fasta.id, str(fasta.seq)
    seqLen.append(len(fasta))

a = np.array(seqLen)
p = np.percentile(a,95)
print p

I am just trying to figure out a way to duplicate this over the entire fasta file for all 100 shuffled permutations. Basically I am looking for 100 numbers in a list for each 95th percentile.

python fasta • 1.6k views
ADD COMMENT
1
Entering edit mode
7.2 years ago
george.ry ★ 1.2k
from collections import defaultdict as dd
from Bio import SeqIO
import numpy as np

shuffles = dd(list)

for record in SeqIO.parse(open('firstperm.fasta'), 'fasta'):
    id = record.id.split('_')[0]
    shuffles[id].append(len(record))

for k, v in shuffles.items():
    print(k, np.percentile(v, 95))

Something like that? (Not tested ... on the wrong machine)

ADD COMMENT
1
Entering edit mode

Thanks this works! Appreciate it!

ADD REPLY

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6