I am trying to convert a csv file to bed what is the best command to do this?
I am trying to convert a csv file to bed what is the best command to do this?
Below is a Python API solution using the pybed
submodule I wrote.
Assume you have a CSV file named example.csv
:
$ cat example.csv
chr1,100,200
chr2,400,500
chr3,100,200
Run below in Python after installing the fuc
package which contains the pybed
submodule:
>>> import pandas as pd
>>> from fuc import pybed
>>> df = pd.read_csv('example.csv', header=None)
>>> df.columns = ['Chromosome', 'Start', 'End']
>>> bf = pybed.BedFrame.from_frame(meta=[], data=df)
>>> bf.to_file('example.bed')
Check the resulting BED:
$ cat example.bed
chr1 100 200
chr2 400 500
chr3 100 200
Of course you could've just replaced ,
with \t
directly on the original CSV file, but using the pybed
submodule will robustly check for any potential errors that could arise during file format conversion.
by that extension, any delimited file is not a format at all. Many people store numbers in csv format, not just text. 1 based numbering is generic, where as 0 based numbering is special in general representation of numbers. Since OP didn't mention the numbering method in CSV, assumption would be 1 based, not zero based. In addition, any example numbering uses 1 based numbering, than 0 based in numbering and that is what I assumed.
Thanks for the explanation. I see what you mean now, which will entirely depend on whether the OP's data is 0-based or 1-based to begin with. But thanks to your comment, he or she will now know the risk -- the pybed
submodule won't add or subtract offset. If needed, the OP is recommended to do this before constructing a pybed.BedFrame
object. For example, to add 1 to every Start
position:
>>> df = pd.read_csv('example.csv', header=None)
>>> df.columns = ['Chromosome', 'Start', 'End']
>>> df.Start = df.Start + 1
>>> bf = pybed.BedFrame.from_frame(meta=[], data=df)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
See the first two sentences of swbarnes2 comment here.
The second paragraph in Alex Reynolds' answer on that same Biostars post outlines the process: