Entering edit mode
9.8 years ago
897598644
▴
100
Excuse me:
After I got sequences with command: bedtools/fastaFromBed -fi /ucsc.hg19.fasta -bed /range.bed -fo /range.out.fasta
, the output file is like:
>chr2:200172828-200174428
AGTGTCAGTAGTTAAACTCAATT
I do not think this file is fasta format because some software can not recognize it.
So how could I output fasta format file? Many thanks in advance!
Can you show few lines of the fasta format which is recognised by software ? I am curious to know the software name ??
One fasta format file I downloaded from pubmed was like:
But my file only had two lines:one was chr2:200172828-200174428. The other were all the bases.
It's possible that the program is expecting the sequence to be split over multiple lines (it doesn't have to). You can achieve this by simply using the
fold
command from the command line (assuming you're not using Windows), such asfold -w 80 file.fa > file.wrapped.fa
. Note that this will produce problems if there are really long header lines, but that's unlikely to occur in bedtools output. You can use biopython/bioperl/etc. in these cases.-w 80 or -w 70?
Doesn't matter. One will wrap things to 80 characters, the other 70. There's no defined length for lines in a fasta file, though it's quite possible that the program you're using is expecting a single length (it should mention this in its documentation).
It does not work, either. I used the fasta format file as the reference sequence in the MutationSurveyor software. The problem was that it did not show the position in the human genome but show the relative position. On the contrary, The fasta format file downloaded from ncbi could tell me the absolute position in the human genome.