Hi.
I am considering saving some genomic data to a bed file. However, I am a bit concern about the format and what, exactly, distinguishes the track lines from the content.
Take these examples:
the header can have a variable number of lines, there isn't, as far a I know, a limited set of starting keywork (track and browser are two, but I have seen others and, furthermore, they can span several lines). I don't want to search for "chr*" as sometimes people save chromosomes only with their number and, anyway, there could be contigs or scafolds or whatever. The only differences I can see is that content has tabs, header has spaces, but If I use this to decide when the content starts, an accidental tab in the header would mess up. And, apparently, the content COULD be space delimited
thanks
Hello Stefano, Really difficult to distinguish. I would think of searching for 2nd AND 3rd field (Required fields) as coordinates (independent of tab or space delimited files).
AndreiR
Very good question. Usually I consider that all the lines not starting with "browser" or "track" are the content lines.
except that, it looks, they can span several lines. In the example, one line starts with "itemRgb="On""
no, in that example, itemRgb is on the same line as "track name". See the raw file of the example here: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt
thanks, if that is the case, then it's not too hard. I haven't found a documents that "officially" states something like that, though...
yes, that might work actually... column 2 and 3 are always number in the content, but never in browser or track...