Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.
Hello, I would like to know how to use Bedtools to extract promoter sequences (as FASTAs) from the mouse genome (mm9) starting from a BED file.
As an example, let's say you define your promoter as the 2kb upstream of your gene and the you have a bed file with the chrom, txStart, and txEnd, name, num_exons, and strand for each gene you are interested in. Something like the following:
head -n4 genes.bed
chr1 134212701 134230065 Nuak2 8 +
chr1 134212701 134230065 Nuak2 7 +
chr1 33510655 33726603 Prim2, 14 -
chr1 25124320 25886552 Bai3, 31 -
bedtools flank -i genes.bed -g mm9.chromsizes -l 2000 -r 0 -s > genes.2kb.promoters.bed
This will give you the upstream regions based on strand as follows:
chr1 134210701 134212701 Nuak2 8 +
chr1 134210701 134212701 Nuak2 7 +
chr1 33726603 33728603 Prim2, 14 -
chr1 25886552 25888552 Bai3, 31 -
`
You can now use this BED file to extract the sequence (based on strand) from the mm9 genome.
bedtools getfasta -fi mm9.fa -bed genes.2kb.promoters.bed -fo genes.2kb.promoters.bed.fa
NOTE: The "mm9.chromsizes" file is a tab delimited file where each line has a chrom name and a chrom length. See the bedtools manual for examples. mm9.fa is meant to represent the name of the mouse reference genome in fasta format.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
did you miss a 0 after -r in flank?
@brentp - yep, thank you sir.
Thanks for the solution but I do not understand what kind of error is in the code, sorry. Could you please provide the fixed command for the given example?
Does the edit above help?
One issue solved but still says: "Less than the req'd two fields were encountered in the genome file". I should work on the input files, but with this hint I think I will be able to solve the problem.