Scripting method to parse fastq diploid consensus sequence bins based on quality

0

Entering edit mode

9.3 years ago

memory_donk ▴ 380

Hi Biostars,

I need to write a script that will accept scaffolds from a diploid consensus sequence in fastq format like one generated by this command [1], break the sequence into non-overlapping bins of 100bp, and give a true or false output based on whether they were above some quality threshold.

Where I'm stumbling is finding an object-oriented module that nicely packages up accession of fastq sequences along with their quality scores. In BioPerl I'd have no trouble breaking a fastq sequence up into bins and doing something with them, but there doesn't seem to be a method for accessing a region of a fastq entry to get its quality.

I'm mostly comfortable with BioPerl and maybe could figure out BioPython if needed. Does anyone know of a module that does something to this effect?

[1] samtools mpileup -C50 -uf ref.fa aln.bam | bcftools view -c - | vcfutils.pl vcf2fq -d 10 -D 100 | gzip > diploid.fq.gz

fastq parsing bioperl biopython • 1.6k views

ADD COMMENT • link 9.3 years ago by memory_donk ▴ 380

Login before adding your answer.