SNPs flanking sequences finding based on VCF and genome Fasta files
1
0
Entering edit mode
23 months ago
Young-Ho • 0

I referenced other poster that is "Extract SNPs flanking sequences based on VCF and genome Fasta files"

import pysam   
# open vcf file
vcf = pysam.VariantFile("Lespedeza_GBS_96ea_Filtered_SNP_77449.vcf")
# open fasta file
genome = pysam.FastaFile("Lespedeza_assembled_contig.fa")
# define by how many bases the variant should be flanked
flank = 50

# iterate over each variant
for record in vcf:
    # extract sequence
    #
    # The start position is calculated by subtract the number of bases
    # given by 'flank' from the variant position. The position in the vcf file
    # is 1-based. pysam's fetch() expected 0-base coordinate. That's why we
    # need to subtract on more base.
    #
    # The end position is calculated by adding the number of bases
    # given by 'flank' to the variant position. We also need to add the length
    # of the REF value and subtract again 1 due to the 0-based/1-based thing.
    #
    # Now we have the complete sequence like this:
    # [number of bases given by flank]+REF+[number of bases given by flank]
    seq = genome.fetch(record.chrom, record.pos-1-flank, record.pos-1+len(record.ref)+flank)

    # print out tab seperated columns:
    # CRHOM, POS, REF, ALT, flanking sequencing with variant given in the format '[REF/ALT]'
    print(
        record.chrom,
        record.pos,
        record.ref,
        record.alts[0],
        '{}[{}/{}]{}'.format(seq[:flank], record.ref, record.alts[0], seq[flank+len(record.ref):]),
        sep="\t"
    )

This work very well. But when I running this script, there was a problem.

Traceback (most recent call last):
  File "./flaking.py", line 26, in <module>
    seq = genome.fetch(record.chrom, record.pos-1-flank, record.pos-1+len(record.ref)+flank)
  File "pysam/libcfaidx.pyx", line 290, in pysam.libcfaidx.FastaFile.fetch
  File "pysam/libcutils.pyx", line 252, in pysam.libcutils.parse_region
ValueError: start out of range (-10)

It would be start position of flanking sequence < 0, that cause this error.

Instead of calling an error with making the program stop, can I continue finding for flanking sequences?

sequence flanking • 1.1k views
ADD COMMENT
0
Entering edit mode
23 months ago
flank = 50

if any SNP has a position POS lower than 50, it will result into a negative position in record.pos-1-flank. You should add a min statement in you code

ADD COMMENT
0
Entering edit mode

Thank you for your rapid reply. I am basic in python. Please tell me how to add min statemet in my code.

ADD REPLY

Login before adding your answer.

Traffic: 1676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6