How Do I Convert Fa Files To Bed Format?

How Do I Convert Fa Files To Bed Format?

3

Entering edit mode

12.9 years ago

Keziah ▴ 80

how do I convert fa files to bed format?

fasta bed • 38k views

ADD COMMENT • link updated 20 months ago by shelkmike ★ 1.4k • written 12.9 years ago by Keziah ▴ 80

2

Entering edit mode

@michael my guess would be that it is due to a confusion of formats & their purposes. happens to the best!

ADD REPLY • link 12.9 years ago by Deniz ▴ 140

0

Entering edit mode

Noahaus, your script gives different results than faidx does..... I believe faidx as it is a community tool thats been around a while, you may want to double check your script (or maybe file a bug with faidx),

ADD REPLY • link 5.1 years ago by john.major • 0

8

Entering edit mode

12.9 years ago

Pierre Lindenbaum 164k

You can't.

A Fa/Fasta file describes a sequence of DNA/Protein:

>Name
ATAGCTACGATTACGACGTACG
ATCGATCGATGCATCAGCTACT
AACTAGTCGATGATGCATACG...

A bed file describes some features mapped on a genome/sequence:

chr1  786 9879 gene1
chr2  486 979  gene2

The only thing you can do is saying that the BED contains only one feature: your sequence:

Name  0  1098 Name

ADD COMMENT • link 12.9 years ago by Pierre Lindenbaum 164k

3

Entering edit mode

12.9 years ago

Matt Shirley 10k

This is a good, though slightly misguided question. If you want to make a BED file from a FASTA sequence, you might do something like this:

Find your FASTA sequence. We'll use human hemoglobin beta as an example.
Use BLAST to align your sequence to the human reference genome.
In the alignment example above, you would pick the genomic alignment, not transcript, and choose "subject" start and end positions. Note that this example has 3 exons, so your BED file start would be 5186957 and end would be 5188159.
Create a BED file using the chromosome (11 in above example), chromstart (5186957), chromend (5188159), and a name for your gene (HBB). You can then upload this as a custom track in the UCSC genome browser.
You will probably want to add introns to your gene structure, so at this point you can the GenePred format instead of BED. This allows you to specify the strand, as well as introns, exons, and transcription factor binding sites.

ADD COMMENT • link 12.9 years ago by Matt Shirley 10k

2

Entering edit mode

12.9 years ago

Madelaine Gogol 5.3k

Uh, align it to a genome?

ADD COMMENT • link 12.9 years ago by Madelaine Gogol 5.3k

1

Entering edit mode

Use bowtie or bwa, generate bam format, then use bedtools bamToBed to generate a bed file.

Or use blat and this perl script to convert to bed... https://github.com/mmarchin/utilities/blob/master/parseBlat.pl

ADD REPLY • link 12.9 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

And some aligners such as BLAT will output alignments in BED format...

ADD REPLY • link 12.9 years ago by Malachi Griffith 20k

0

Entering edit mode

@malachig - I don't think BLAT outputs BED. The -out=type is one of: psl - Default. Tab separated format, no sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabul

ADD REPLY • link 12.9 years ago by Casey Bergman 18k

1

Entering edit mode

8.6 years ago

cpcantalapiedra ▴ 170

cat $fastafile | awk '$0 ~ "^>" {name=substr($0, 2); printf name"\t1\t"} $0 !~ "^>" {printf length($0)"\t"name"\n"}'

If each fasta sequence spans several lines, substitute the awk script by:

BEGIN{totallen=-1;} $0 ~ "^>" {if (totallen!=-1) print totallen"\t"name; name=substr($0, 2); printf name"\t1\t"; totallen=0} $0 !~ "^>" {totallen=totallen+length($0);} END{if (totallen!=-1) print totallen"\t"name;}'

ADD COMMENT • link 8.6 years ago by cpcantalapiedra ▴ 170

0

Entering edit mode

Doesn't bed format start with zero as the first base?

ADD REPLY • link 7.8 years ago by KevinL ▴ 70

1

Entering edit mode

6.4 years ago

noahaus ▴ 10

Old question, but in the spirit of good science I will post a script that takes any genome fasta file and creates a genome BED file. Very niche but I don't think there is a good converter out there quite yet.

https://github.com/noahaus/Micellaneous-Tools/blob/master/genome2bed.py

I'd read the comments in the script before beginning.

ADD COMMENT • link 6.4 years ago by noahaus ▴ 10

1

Entering edit mode

Many thanks for this. Strange as it may seem, I've been looking for a simple way to do this for a while. There were a couple of minor issues with the script (the annotation length includes line feeds, and the last line fails to be included unless there's an extra line feed at the end - these may be OS specific) but it does the job perfectly.

ADD REPLY • link 6.4 years ago by mah1 ▴ 10

5

Entering edit mode

You can accomplish this using faidx -i bed genome.fa > out.bed. For more details you can check out the documentation: https://github.com/mdshw5/pyfaidx#cli-script-faidx

ADD REPLY • link 6.4 years ago by Matt Shirley 10k

0

Entering edit mode

Sorry, but this script is totally wrong. It calculates sequences' lengths incorrectly and also doesn't write the last sequence. I've created an Issue on GitHub https://github.com/noahaus/Micellaneous-Tools/issues/1 .

ADD REPLY • link 20 months ago by shelkmike ★ 1.4k

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 1522 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6