Secuence length in bed file
1
0
Entering edit mode
4.1 years ago
lsudupe ▴ 20

Hi!

I´m trying to graph the length of the sequences from a bed file. I know how to get the length with awk, but I want to do it with python, using pybedtools. How do I do it? I

Thank you!

bed python sequence • 1.5k views
ADD COMMENT
0
Entering edit mode

What have you tried? Why do you want to do it in python?

ADD REPLY
0
Entering edit mode

I tried to iterate from components and got the following error "OverflowError: Cannot convert negative value to CHRPOS". I am new to bioinformatics and I am not sure if it is the correct way. I know is related to starting coordinates, but I don't know how to fix it. I want to do it in Python to reuse the program from all the bed files that I have to analyze.

Thanks!

ADD REPLY
0
Entering edit mode

It seems that a position is exceeding the maximum value that the variable being used for it can handle. You might need to search on how to get pybedtools to handle large integer values.

ADD REPLY
0
Entering edit mode

You're right.Thanks for your advice!

ADD REPLY
1
Entering edit mode
4.0 years ago

Why would you need pybedtools? Just subtract the stop and start coordinates to get the sequence length. Put the lengths into an array and plot a histogram (or do whatever else you want with the lengths):

#!/usr/bin/env python

import numpy as np
import matplotlib.pyplot as plt
import sys

lengths = []
for line in sys.stdin:
    elems = line.rstrip().split('\t')
    lengths.append(int(elems[2]) - int(elems[1]))

plt.hist(np.array(lengths), bins='auto')  # arguments are passed to np.histogram
plt.title("Sequence lengths histogram")
plt.show()

To use, e.g.:

$ python ./so471158.py < in.bed
ADD COMMENT
0
Entering edit mode

Thank for the example! Help me a lot. I want to use pybedtools because mi supervisor told me, but it isn´t required

ADD REPLY

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6