Appending Seqrecords In Biopython
3
1
Entering edit mode
13.1 years ago
User 5433 ▴ 10

I'm trying to generate a test dataset of randomly sampled 150bp sequences from a complete genome. I found what appeared to be the perfect Biopython script on the SeqIO Wiki page (http://biopython.org/wiki/SeqIO#Randomsubsequences) but genomefrags.append(record) throws an Attribute error that 'Seq' object has no attribute append. I'm not sure why this script is listed if that is really the case unless this is an issue with my current version of Biopython (1.52) which I run off my work server (so it would take some work to get it updated). The Biopython Tutorial and Cookbook says that Seq object could only be added starting with version 1.53 but adding is more of a concatenation, not an appending of separate objects into a single file. Does anyone know a way around this keeping in mind that I can write very minimal code at this point?

genome_frags=[]
limit=len(genome_record.seq)
for i in range(0, 1000) :
    start=randint(0,limit-150)
    end=start+150
    genome_frags=genome_record.seq[start:end]
    print (type(genome_frags))
    record=SeqRecord(genome_frags,'fragment_%i' % (i+1),'','')
    genome_frags.append(record)
biopython random fasta • 8.6k views
ADD COMMENT
0
Entering edit mode

Do you want a list of SeqRecords, each containing a random sample or one long SeqRecord made of those random samples?

ADD REPLY
3
Entering edit mode
13.1 years ago
brentp 24k

You're overwriting genome_frags. Try:

genome_frags=[]
limit=len(genome_record.seq)
for i in range(0, 1000) :
    start=randint(0,limit-150)
    end=start+150
    frag=genome_record.seq[start:end]
    record=SeqRecord(frag,'fragment_%i' % (i+1),'','')
    genome_frags.append(record)
ADD COMMENT
0
Entering edit mode

This worked beautifully. Simple mistake. Thanks so much! It's small victories like this that keep me going.

ADD REPLY
0
Entering edit mode
13.1 years ago

You are assigning something to the variable genome_frags twice in the same loop. You should rename the first genome_frags = genome_record.seq[start:end] variable and rename it in record=... accordingly.

ADD COMMENT
0
Entering edit mode
13.1 years ago

Some tools are optimized for that purpose if you need: How To Produce Simulated 'Synthetic' Sequences

ADD COMMENT

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6