What next gen file format can be converted to Bed format, and how. What does does it mean to sort a Bed file.
What next gen file format can be converted to Bed format, and how. What does does it mean to sort a Bed file.
What next gen file format can be converted to Bed format
Any format with data containing, at least, a chromosome name and a start and stop index.
and how
If the format is text-based or can be converted into text format, you can use UNIX tools like cut
and awk
.
To save time and for reproducibility, people have written and published conversion scripts, such as *2bed
scripts in BEDOPS.
Another advantage of scripts like these is that they take care of index changes, which are easy to mess up and can otherwise cause errors in downstream analysis.
What does does it mean to sort a Bed file.
Sorting often, but not always, means lexicographical sorting on chromosome name, followed by numerical sorting on the start index, followed by numerical sorting on the stop index. This can be done efficiently with tools like BEDOPS sort-bed
or, less efficiently, with GNU coreutils sort
. Sorting is done originally for BEDOPS binaries that perform set and map operations, in order to gain significant performance and memory enhancements.
Any file format that has chromosome name, start and end columns can be converted to bed format.
See here:http://genome.ucsc.edu/FAQ/FAQformat.html#format1
SOme formats that can be converted to BED are vcf, gff, gtf, bam etc.
If you read the link I have provided, it says that for bed format you need chromosome name, start and end position. These three columns are compulsory for bed format. So fastq format will have to be mapped first and then the mapping position of reads can be converted to bed format.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Note that there are two distinct BED formats. One is the BED used in plink to store SNP genotype data, which is still used but being abandoned for VCF. Another is the BED used in the UCSC Genome Browser to visualize annotation data.