Bedtools output fasta format files?
1
0
Entering edit mode
9.8 years ago
897598644 ▴ 100

Excuse me:

After I got sequences with command: bedtools/fastaFromBed -fi /ucsc.hg19.fasta -bed /range.bed -fo /range.out.fasta, the output file is like:

>chr2:200172828-200174428
AGTGTCAGTAGTTAAACTCAATT

I do not think this file is fasta format because some software can not recognize it.

So how could I output fasta format file? Many thanks in advance!

next-gen-sequencing genome-sequence • 3.2k views
ADD COMMENT
0
Entering edit mode

Can you show few lines of the fasta format which is recognised by software ? I am curious to know the software name ??

ADD REPLY
0
Entering edit mode

One fasta format file I downloaded from pubmed was like:

>gi|224589811:c200335989-200134223 Homo sapiens chromosome 2, GRCh37.p13 Primary Assembly
AGAGGCGCTTAAGTTACCAAGGGATTAGGGCTGATCTCAGGAGAGGTAAACGACCATCCTTGGAACACGG
AGCCCTTCTTCCCTGCCCGGTATCTGCGCGTGCCTTGGGTAGTCCGCACAACCCTCCCCAGCTCCGGATG
CCCTGGGATACCCGGACCCAGGAGAGAGCGCGTCAGCGGGGCGCAGCTACTTTGCACTCGCCGATTCTGA
CACAACAGATAGTTAATTGGGGCCTTCGAAATCAAGGACTAAGGTGAGCAGAGGAGTCCCCCAGCCCCTG

But my file only had two lines:one was chr2:200172828-200174428. The other were all the bases.

ADD REPLY
0
Entering edit mode

It's possible that the program is expecting the sequence to be split over multiple lines (it doesn't have to). You can achieve this by simply using the fold command from the command line (assuming you're not using Windows), such as fold -w 80 file.fa > file.wrapped.fa. Note that this will produce problems if there are really long header lines, but that's unlikely to occur in bedtools output. You can use biopython/bioperl/etc. in these cases.

ADD REPLY
0
Entering edit mode

-w 80 or -w 70?

ADD REPLY
0
Entering edit mode

Doesn't matter. One will wrap things to 80 characters, the other 70. There's no defined length for lines in a fasta file, though it's quite possible that the program you're using is expecting a single length (it should mention this in its documentation).

ADD REPLY
0
Entering edit mode

It does not work, either. I used the fasta format file as the reference sequence in the MutationSurveyor software. The problem was that it did not show the position in the human genome but show the relative position. On the contrary, The fasta format file downloaded from ncbi could tell me the absolute position in the human genome.

ADD REPLY
2
Entering edit mode
9.8 years ago

That's correct fasta format. If you have a software package that can't handle it then said software has a bug.

ADD COMMENT

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6