Hi,
I want to add 2000 bp on either side of "Alu elements", a file in BED format
For that purpose, I need to download the human genome (hg19) to use for my bedtools slop command so it can add these basepairs accordingly. I am not sure in which format the genome had to be so I downloaded it from UCSC by going to their directory: goldenPath/hg19/chromosomes and then typing: mget -a
I unzipped them (all were now .fa files) and then wanted to combine all the chromosomes together in one file: cat *.fa > hg19.fasta
but when I run
bedtools slop -i Alu_elements -g hg19.fasta -b 2000
I get the following error message: Less than the req'd two fields were encountered in the genome file (hg19.fasta) at line 1. Exiting.
I am not sure where the problem is: Is it in the genome and how I unzipped / combined it..? Before combining, there were separate files for each chromosome: chr1.fa, chr1.gl000194_random.fa, etc...
Does the bedtools command need the genome to be in a BED format as well? If yes, how do I download the genome in this format? I tried to find it on the UCSC page table browser, but there are so many options under "Tables" and "Tracks" and I don't know which to choose to download the whole genome, not just specific elements within it.
Thanks!
If I am not wrong genome file is usually not a fasta but a file with format <chrname> <size> . The size file should be available here https://genome.ucsc.edu/goldenpath/help/hg19.chrom.sizes