I have txt file for genome gap assembly like below:
585 chr10 0 50000 1 N 50000 clone no
78 chr10 5627110 5677110 51 N 50000 clone yes
722 chr10 18014681 18064681 161 N 50000 clone yes
881 chr10 38858841 38908841 337 N 50000 contig no
884 chr10 39194941 39244941 340 N 50000 contig no
13 chr10 39244941 41624941 341 N 2380000 centromere no
902 chr10 41624941 41674941 342 N 50000 contig no
904 chr10 41866693 41916693 344 N 50000 contig no
116 chr10 45746970 45896970 375 N 150000 contig no
Program said I should convert this to BED files.
So just do cat XXX.txt > XXX.bed
?
If so, why should we bother to use bed, why not just use txt?
What's the point of BED file?
thx
BED is a simple text file. Tools such as BEDOPS will do all sorts of logic and other computations for you (what elements overlap between these N input files? What's the trimmed mean of all ChIP-seq scores falling in every 100 kb window across the genome? etc.). The actual BED format has a fairly strict definition, but various tool suites allow for a more relaxed set of constraints such that only the first 3 fields (chrom, start, end) need to be specified for many operations, while all other columns are essentially free to be whatever you need. This allows for interactions between a tool suite and standard unix commands to manipulate data on the fly without losing any information. In fact, this very simple relaxation of the BED format can encode the information kept in any of the other 20 or so formats you'll commonly encounter in 'bioinformatics' (VCF, GFF, GTF, SAM, WIG, BEDGRAPH, etc). That is, a small extension to the usual BED format can represent anything that any of these other formats offer with no loss of data (see the conversion scripts offered in BEDOPS). However, conversions in the other direction often do not exist in the general case. For example, SAM/BAM is unable to hold signal data. The better question, imo, is why do we have so many file formats and tool suites to operate on each kind of format, when these formats are hardly more than shuffled-column versions of each other?