How Do I Convert Fa Files To Bed Format?
5
3
Entering edit mode
13.0 years ago
Keziah ▴ 80

how do I convert fa files to bed format?

fasta bed • 38k views
ADD COMMENT
2
Entering edit mode

@michael my guess would be that it is due to a confusion of formats & their purposes. happens to the best!

ADD REPLY
0
Entering edit mode

Noahaus, your script gives different results than faidx does..... I believe faidx as it is a community tool thats been around a while, you may want to double check your script (or maybe file a bug with faidx),

ADD REPLY
8
Entering edit mode
13.0 years ago

You can't.

A Fa/Fasta file describes a sequence of DNA/Protein:

>Name
ATAGCTACGATTACGACGTACG
ATCGATCGATGCATCAGCTACT
AACTAGTCGATGATGCATACG...

A bed file describes some features mapped on a genome/sequence:

chr1  786 9879 gene1
chr2  486 979  gene2

The only thing you can do is saying that the BED contains only one feature: your sequence:

Name  0  1098 Name
ADD COMMENT
3
Entering edit mode
13.0 years ago

This is a good, though slightly misguided question. If you want to make a BED file from a FASTA sequence, you might do something like this:

  1. Find your FASTA sequence. We'll use human hemoglobin beta as an example.
  2. Use BLAST to align your sequence to the human reference genome.
  3. In the alignment example above, you would pick the genomic alignment, not transcript, and choose "subject" start and end positions. Note that this example has 3 exons, so your BED file start would be 5186957 and end would be 5188159.
  4. Create a BED file using the chromosome (11 in above example), chromstart (5186957), chromend (5188159), and a name for your gene (HBB). You can then upload this as a custom track in the UCSC genome browser.
  5. You will probably want to add introns to your gene structure, so at this point you can the GenePred format instead of BED. This allows you to specify the strand, as well as introns, exons, and transcription factor binding sites.
ADD COMMENT
2
Entering edit mode
13.0 years ago

Uh, align it to a genome?

ADD COMMENT
1
Entering edit mode

Use bowtie or bwa, generate bam format, then use bedtools bamToBed to generate a bed file.

Or use blat and this perl script to convert to bed... https://github.com/mmarchin/utilities/blob/master/parseBlat.pl

ADD REPLY
0
Entering edit mode

And some aligners such as BLAT will output alignments in BED format...

ADD REPLY
0
Entering edit mode

@malachig - I don't think BLAT outputs BED. The -out=type is one of: psl - Default. Tab separated format, no sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabul

ADD REPLY
1
Entering edit mode
8.6 years ago
cat $fastafile | awk '$0 ~ "^>" {name=substr($0, 2); printf name"\t1\t"} $0 !~ "^>" {printf length($0)"\t"name"\n"}'

If each fasta sequence spans several lines, substitute the awk script by:

BEGIN{totallen=-1;} $0 ~ "^>" {if (totallen!=-1) print totallen"\t"name; name=substr($0, 2); printf name"\t1\t"; totallen=0} $0 !~ "^>" {totallen=totallen+length($0);} END{if (totallen!=-1) print totallen"\t"name;}'
ADD COMMENT
0
Entering edit mode

Doesn't bed format start with zero as the first base?

ADD REPLY
1
Entering edit mode
6.4 years ago
noahaus ▴ 10

Old question, but in the spirit of good science I will post a script that takes any genome fasta file and creates a genome BED file. Very niche but I don't think there is a good converter out there quite yet.

https://github.com/noahaus/Micellaneous-Tools/blob/master/genome2bed.py

I'd read the comments in the script before beginning.

ADD COMMENT
1
Entering edit mode

Many thanks for this. Strange as it may seem, I've been looking for a simple way to do this for a while. There were a couple of minor issues with the script (the annotation length includes line feeds, and the last line fails to be included unless there's an extra line feed at the end - these may be OS specific) but it does the job perfectly.

ADD REPLY
5
Entering edit mode

You can accomplish this using faidx -i bed genome.fa > out.bed. For more details you can check out the documentation: https://github.com/mdshw5/pyfaidx#cli-script-faidx

ADD REPLY
0
Entering edit mode

Sorry, but this script is totally wrong. It calculates sequences' lengths incorrectly and also doesn't write the last sequence. I've created an Issue on GitHub https://github.com/noahaus/Micellaneous-Tools/issues/1 .

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6