Trying to get genome for bedtools
3
4
Entering edit mode
8.3 years ago
radwa.raed ▴ 40

Hi,

I want to add 2000 bp on either side of "Alu elements", a file in BED format

For that purpose, I need to download the human genome (hg19) to use for my bedtools slop command so it can add these basepairs accordingly. I am not sure in which format the genome had to be so I downloaded it from UCSC by going to their directory: goldenPath/hg19/chromosomes and then typing: mget -a

I unzipped them (all were now .fa files) and then wanted to combine all the chromosomes together in one file: cat *.fa > hg19.fasta

but when I run

bedtools slop -i Alu_elements -g hg19.fasta -b 2000

I get the following error message: Less than the req'd two fields were encountered in the genome file (hg19.fasta) at line 1. Exiting.

  1. I am not sure where the problem is: Is it in the genome and how I unzipped / combined it..? Before combining, there were separate files for each chromosome: chr1.fa, chr1.gl000194_random.fa, etc...

  2. Does the bedtools command need the genome to be in a BED format as well? If yes, how do I download the genome in this format? I tried to find it on the UCSC page table browser, but there are so many options under "Tables" and "Tracks" and I don't know which to choose to download the whole genome, not just specific elements within it.

Thanks!

genome • 18k views
ADD COMMENT
0
Entering edit mode

If I am not wrong genome file is usually not a fasta but a file with format <chrname> <size> . The size file should be available here https://genome.ucsc.edu/goldenpath/help/hg19.chrom.sizes

ADD REPLY
6
Entering edit mode
8.3 years ago
igor 13k

As others already pointed out, bedtools genome file is also known as chrom.sizes file. If you can't download it, you can generate it yourself from an indexed FASTA file:

samtools faidx genome.fa
cut -f 1,2 genome.fa.fai > chrom.sizes
ADD COMMENT
1
Entering edit mode

.fai file can also be directly used with -g. As per bedtools: ".fai files may be used as genome (-g) files."

ADD REPLY
1
Entering edit mode
8.3 years ago

When in doubt, read the manual:

1.3.10 What is a “genome” file?

Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of the chromosomes for the organism for which your BED files are based. When using the UCSC Genome Browser, Ensemble, or Galaxy, you typically indicate which species / genome build you are working. The way you do this for BEDTools is to create a “genome” file, which simply lists the names of the chromosomes (or scaffolds, etc.) and their size (in base pairs).

Genome files must be tab-delimited and are structured as follows (this is an example for C. elegans):

chrI 15072421

chrII 15279323

chrX 17718854

chrM 13794

BEDTools includes predefined genome files for human and mouse in the /genomes directory included in the BEDTools distribution.

ADD COMMENT
0
Entering edit mode
8.3 years ago
radwa.raed ▴ 40

Thank you so much. This makes a lot of sense. I don't need the sequence itself, just the length of each chromosome.

I implemented what you said but am still stuck:

I tried either to go to the /genomes directory of Bedtools and copy the hg19.genome into the same directory where my Alu_elements file is but am getting an error that the file could not be opened

bedtools slop -i Alu_elements.bed -g hg19.genome -b 2000

Error: The requested file (Alu_elements.bed) could not be opened. Error message: (No such file or directory). Exiting!

OR

by downloading a BED file from the link posted above.

bedtools slop -i Alu_elements.bed -g hg19.chrom.sizes.BED -b 2000

Error: The requested genome file (hg19.chrom.sizes.BED) could not be opened. Exiting!

Samples from hg19.chrom.sizes.BED chr1 249250621 chr2 243199373 chr3 198022430

Samples from Alu_elements.BED chr1 16777160 16777470 AluSp 2147 + chr1 25165800 25166089 AluY 2626 -

Could the 'extra' columns in Alu_elements.BED be throwing it off? I am unsure..

Many thanks!

ADD COMMENT
0
Entering edit mode

Have you tried to explicitly do this (provided both files are in the directory you are running this from)?

bedtools slop -i ./Alu_elements.bed -g ./hg19.genome -b 2000
ADD REPLY
0
Entering edit mode

Yes but when I try to run it, I receive this error msg

Error: The requested genome file (./hg19.chrom.sizes.bed) could not be opened. Exiting!

but outside of the command line, I can open the file itself and see the entries

ADD REPLY
0
Entering edit mode

I am in the correct directory..

https://postimg.org/image/ph35bjigr/

ADD REPLY
0
Entering edit mode

From that screenshot, I see there is hg19.chrom.sizes.BED.txt, but not hg19.chrom.sizes.bed, which is what you specify.

Also, there is Alu_elements.BED.txt and Alu_elements, but not Alu_elements.BED, which is what you specify.

Run ls and then copy and paste the proper file names into your command. Don't try to type them manually.

ADD REPLY
0
Entering edit mode

I see, please correct me if I am wrong, but I thought a BED file is a tab-delimited one. And I read that to create a BED file you need to save as tab-delimited and then add in .BED at the end. Did I misunderstand?

ADD REPLY
0
Entering edit mode

Yes, but you need to specify the filename exactly as it is (what you see when you run ls). The filename you gave the file and the filename that you give to bedtools are not the same.

ADD REPLY

Login before adding your answer.

Traffic: 1714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6