Question

Bedtools output fasta format files？

0

Entering edit mode

9.8 years ago

897598644 ▴ 100

Excuse me:

After I got sequences with command: bedtools/fastaFromBed -fi /ucsc.hg19.fasta -bed /range.bed -fo /range.out.fasta, the output file is like:

>chr2:200172828-200174428
AGTGTCAGTAGTTAAACTCAATT

I do not think this file is fasta format because some software can not recognize it.

So how could I output fasta format file? Many thanks in advance!

next-gen-sequencing genome-sequence • 3.2k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by 897598644 ▴ 100

0

Entering edit mode

Can you show few lines of the fasta format which is recognised by software ? I am curious to know the software name ??

ADD REPLY • link 9.8 years ago by GouthamAtla 12k

0

Entering edit mode

One fasta format file I downloaded from pubmed was like:

>gi|224589811:c200335989-200134223 Homo sapiens chromosome 2, GRCh37.p13 Primary Assembly
AGAGGCGCTTAAGTTACCAAGGGATTAGGGCTGATCTCAGGAGAGGTAAACGACCATCCTTGGAACACGG
AGCCCTTCTTCCCTGCCCGGTATCTGCGCGTGCCTTGGGTAGTCCGCACAACCCTCCCCAGCTCCGGATG
CCCTGGGATACCCGGACCCAGGAGAGAGCGCGTCAGCGGGGCGCAGCTACTTTGCACTCGCCGATTCTGA
CACAACAGATAGTTAATTGGGGCCTTCGAAATCAAGGACTAAGGTGAGCAGAGGAGTCCCCCAGCCCCTG

But my file only had two lines:one was chr2:200172828-200174428. The other were all the bases.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by 897598644 ▴ 100

0

Entering edit mode

It's possible that the program is expecting the sequence to be split over multiple lines (it doesn't have to). You can achieve this by simply using the fold command from the command line (assuming you're not using Windows), such as fold -w 80 file.fa > file.wrapped.fa. Note that this will produce problems if there are really long header lines, but that's unlikely to occur in bedtools output. You can use biopython/bioperl/etc. in these cases.

ADD REPLY • link 9.8 years ago by Devon Ryan 105k

0

Entering edit mode

-w 80 or -w 70?

ADD REPLY • link 9.8 years ago by 897598644 ▴ 100

0

Entering edit mode

Doesn't matter. One will wrap things to 80 characters, the other 70. There's no defined length for lines in a fasta file, though it's quite possible that the program you're using is expecting a single length (it should mention this in its documentation).

ADD REPLY • link 9.8 years ago by Devon Ryan 105k

0

Entering edit mode

It does not work, either. I used the fasta format file as the reference sequence in the MutationSurveyor software. The problem was that it did not show the position in the human genome but show the relative position. On the contrary, The fasta format file downloaded from ncbi could tell me the absolute position in the human genome.

ADD REPLY • link 9.8 years ago by 897598644 ▴ 100

score 2 · Answer 1 · 2015-03-11

2

Entering edit mode

9.8 years ago

Devon Ryan 105k

That's correct fasta format. If you have a software package that can't handle it then said software has a bug.

ADD COMMENT • link 9.8 years ago by Devon Ryan 105k