Question

Extracting 200bp flanking sequence around a snp using Bedtools

0

Entering edit mode

7.9 years ago

paulotyama • 0

Hi,

I have been trying to use bedtools to extract 200bp flanking sequence around my snps

bedtools slop -i snps.v03.bed -g Aradu.len.v02 -b 200

I get this error return instead
Less than the req'd two fields were encountered in the genome file (Aradu.len.v02) at line 2. Exiting.

What required fields is the code complaining about?

Here's a snippet of my genome file

chrom      size
Aradu.A01  107035537
Aradu.A02  93869048
Aradu.A03  135057546
Aradu.A04  123556382
Aradu.A05  110037037

And here's the snp file

Aradu.A02  1992401   AX-147212565
Aradu.A02  67418563  AX-147213997
Aradu.A02  67418823  AX-147213999
Aradu.A02  67418874  AX-147214001
Aradu.A02  67420580  AX-147214004
Aradu.A02  79194257  AX-147214308

Can you help with this?

Kind regards,

Paul

SNP Flanking sequences Bedtools Extract • 4.4k views

ADD COMMENT • link updated 7.9 years ago by GouthamAtla 12k • written 7.9 years ago by paulotyama • 0

1

Entering edit mode

Can you format your file contents as code (highlight and click on the button with 0s and 1s)? Otherwise, it gets auto-formatted poorly.

It looks like your BED file is not really a BED file. It should start with these 3 columns: chromosome, start, end.

ADD REPLY • link 7.9 years ago by igor 13k

1

Entering edit mode

Formatting adjusted.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks Igor.

I reformatted the snp file as BED like you suggested and removed headers from the genome file but I still get the same complaint.

Paul

ADD REPLY • link 7.9 years ago by paulotyama • 0

score 2 · Answer 1 · 2017-02-03

2

Entering edit mode

7.9 years ago

GouthamAtla 12k

The problem is with the format.

Genome file should be:

Aradu.A01  107035537
Aradu.A02  93869048
Aradu.A03  135057546
Aradu.A04  123556382
Aradu.A05  110037037

SNP file should be a BED format

Aradu.A02       1992401 1992401 AX-147212565
Aradu.A02       67418563        67418563        AX-147213997
Aradu.A02       67418823        67418823        AX-147213999
Aradu.A02       67418874        67418874        AX-147214001
Aradu.A02       67420580        67420580        AX-147214004
Aradu.A02       79194257        79194257        AX-147214308

You could easily generate that using:

awk -v OFS="\t" '{ print $1,$2,$2,$3}' snps.v03.bed > snps.v03.new.bed

ADD COMMENT • link 7.9 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks Goutham,

I still get the same complaint even after I reformat both files as you suggested. Any more ideas?

Thanks again,

Paul

ADD REPLY • link 7.9 years ago by paulotyama • 0

0

Entering edit mode

1) @Goutham did not say to reformat both files, only the SNP file. And it would be useful to show us a few lines of the reformatted SNP file for us to validate.

2) Are you sure that your Genome file is tab-delimited? The error message indicates less than two fields. Or the first line (chrom size) could be the problem. Delete it and retest.

ADD REPLY • link 7.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Nope, it works perfectly!

I just had to format my genome file with a tab delimiter.

ADD REPLY • link 7.9 years ago by paulotyama • 0