Question

What Does Distinguish Header From Content In Bed Files?

4

Entering edit mode

11.2 years ago

Stefano Berri 4.4k

Hi.

I am considering saving some genomic data to a bed file. However, I am a bit concern about the format and what, exactly, distinguishes the track lines from the content.

Take these examples:

the header can have a variable number of lines, there isn't, as far a I know, a limited set of starting keywork (track and browser are two, but I have seen others and, furthermore, they can span several lines). I don't want to search for "chr*" as sometimes people save chromosomes only with their number and, anyway, there could be contigs or scafolds or whatever. The only differences I can see is that content has tabs, header has spaces, but If I use this to decide when the content starts, an accidental tab in the header would mess up. And, apparently, the content COULD be space delimited

thanks

bed parsing format • 11k views

ADD COMMENT • link updated 10.2 years ago by Biostar 20 • written 11.2 years ago by Stefano Berri 4.4k

2

Entering edit mode

Hello Stefano, Really difficult to distinguish. I would think of searching for 2nd AND 3rd field (Required fields) as coordinates (independent of tab or space delimited files).

AndreiR

ADD REPLY • link 11.2 years ago by AndreiR ▴ 260

1

Entering edit mode

Very good question. Usually I consider that all the lines not starting with "browser" or "track" are the content lines.

ADD REPLY • link 11.2 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

except that, it looks, they can span several lines. In the example, one line starts with "itemRgb="On""

ADD REPLY • link 11.2 years ago by Stefano Berri 4.4k

1

Entering edit mode

no, in that example, itemRgb is on the same line as "track name". See the raw file of the example here: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt

ADD REPLY • link 11.2 years ago by Giovanni M Dall'Olio 28k

1

Entering edit mode

thanks, if that is the case, then it's not too hard. I haven't found a documents that "officially" states something like that, though...

ADD REPLY • link 11.1 years ago by Stefano Berri 4.4k

0

Entering edit mode

yes, that might work actually... column 2 and 3 are always number in the content, but never in browser or track...

ADD REPLY • link 11.2 years ago by Stefano Berri 4.4k

score 1 · Answer 1 · 2014-01-17

According to me, bed format generally represents a tab delimited file, starting mostly with chromosome, start and end plus the fourth column could be strand, peak height, size, width, confidence etc, if we talk about the peak files. The one in your examples, having the track lines are generally for the visualization in the browser and can be further classified as bigBed, wig or bigWig. I wouldn't confuse between these two. Most of the bed files (like publicly available in GEO), won't have the track lines whereas wig or bigBed will have that.

If you want for the visualization, make a custom header, that stays constant for you as a single or double line, containing name, description, color and type etc or make a track file, which gathers and controls all of your tracks (normal bed files), thus you don't have to annotate each bed file separately.

More info : http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks

http://genome.ucsc.edu/goldenPath/help/bigWig.html (Point #7)

Cheers

score 0 · Answer 2 · 2014-01-17

0

Entering edit mode

11.2 years ago

Alex Reynolds 36k

I don't want to search for "chrN" as sometimes people save chromosomes only with their number

If they are doing that, they aren't following convention. For whatever it's worth, people and research consortiums choosing not to follow spec is not a problem only restricted to the BED format.

ADD COMMENT • link 11.2 years ago by Alex Reynolds 36k

0

Entering edit mode

I know, but, in general, the reference genome might not have only chrN sequences. If I use a reference genome with contigs or scaffolds, for example or the 1000 genome with decoy, they, I think, won't look like chrN

ADD REPLY • link 11.2 years ago by Stefano Berri 4.4k

score 0 · Answer 3 · 2014-05-22

0

Entering edit mode

10.8 years ago

Whoknows ▴ 960

Hi friends

I've had this problem which solved by BEDOPS , sam2bed tool is very good at this. After converting to BED file you can determine BED file header by sam2bed information page :

http://bedops.readthedocs.org/en/latest/content/reference/file-management/conversion/sam2bed.html

enjoy.,,

ADD COMMENT • link 10.8 years ago by Whoknows ▴ 960

score 0 · Answer 4 · 2014-05-22

0

Entering edit mode

10.8 years ago

Jorge Amigo 14k

you could always look to chromosome positions rather than chromosome labels. a pattern such as ^\S+\t\d+\t\d+ will help you to distinguish headers from data lines.

ADD COMMENT • link 10.8 years ago by Jorge Amigo 14k