Filter contigs by size: Different output between quast report and python output

0

Entering edit mode

5.0 years ago

Dave Th ▴ 60

Hi all,

I'm trying to filter my contigs dataset into different files by their length such as 500bp, 1kb, 2kb... I'm using below code to produce my output.

def contigs_filter_by_length(fasta_input, size, fasta_output):
long_contigs =  [] #Create an empty list
for record in SeqIO.parse(fasta_input,"fasta"):
    if len(record.seq) >= size:
        long_contigs.append(record)
print("Found %i contigs" %len(long_contigs))
SeqIO.write(long_contigs,fasta_output,"fasta")

The problem is when I crosschecked with QUAST report of my input file and the output from the code, there was a huge difference between them. QUAST indicated that there are 119787 contigs >= 500bp while the fasta output from the code showed 122046 contigs >=500bp.

Is there anything wrong in my code which lead to this difference?

sequence assembly • 1.7k views

ADD COMMENT • link 5.0 years ago by Dave Th ▴ 60

0

Entering edit mode

I haven't seen anything wrong in your code, have you compared the results? You can find some contigs reported by your python code while not by QUAST to see what caused the difference

ADD REPLY • link 5.0 years ago by Jianyu ▴ 580

0

Entering edit mode

I think this might be the key.

QUAST may be doing some additional filtering of 'junk' sequences which are obvious misassembly artefacts or deduplication.

Not 100% for certain, but that would be my immediate guess.

ADD REPLY • link 5.0 years ago by Joe 21k

0

Entering edit mode

for what "SeqIO.parse" stands for? (trying to understand the command) I'm trying to filter contigs so this code can help me.

ADD REPLY • link 3.7 years ago by v.berriosfarias ▴ 140

1

Entering edit mode

That is standard SeqIO interface included in Biopython (LINK).

ADD REPLY • link 3.7 years ago by GenoMax 147k

0

Entering edit mode

Hello Dave, iḿ trying to use your code for filtering some contigs, but I got a identation error message:

File "contig_length_filter.py", line 2 long_contigs = [] ^ IndentationError: expected an indented block

so I suppose that I must add something on the double brackets?

Regards :)

ADD REPLY • link 3.7 years ago by v.berriosfarias ▴ 140

0

Entering edit mode

IndentationError: expected an indented block ?

ADD REPLY • link 3.7 years ago by cpad0112 21k

0

Entering edit mode

The code in the first post has incorrect indentation levels for python. You should not copy it verbatim.

ADD REPLY • link 3.7 years ago by Joe 21k

Login before adding your answer.