Question

bedToBigBed error, what am I doing wrong?

0

Entering edit mode

7 months ago

Ronin ▴ 10

I am trying to use Homer's scanMotifGenomeWide.pl, and need to convert my bed file to a bigbed file.

scanMotifGenomeWide.pl /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/motifs/eye-motifs/homerMotifs.all.motifs /scratch/ATACseq/FASTQ_ATAC_43955/pfluv-genome/pfluv-genome.fa -bed > scanned-eye-motifsites.bed

sort -k1,1 -k2,2n scanned-eye-motifsites.bed > scanned-eye-motifsites.sorted.bed

However, when I try:

bedToBigBed scanned-eye-motifsites.sorted.bed /scratch/ATACseq/FASTQ_ATAC_43955/pfluv-genome/pfluv-genome.sizes scanned-eye-motifs.bigBed

I get the following error:

Expecting number field 2 line 1 of scanned-eye-motifsites.sorted.bed, got Perca
Expecting number field 2 line 1 of scanned-liver-motifsites.sorted.bed, got Perca
Expecting number field 2 line 1 of scanned-spleen-motifsites.sorted.bed, got Perca

Here is what I see when I try to use the head command on the bed file made from the sort step:

CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    4   11  17-TTCTTTTT 6.933043    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    9   16  17-TTCTTTTT 6.933043    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    10  21  13-TTTTTTTTTTTT 11.062214   +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    12  21  18-TTTTTTTTTT   9.703866    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    14  21  17-TTCTTTTT 7.368553    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    27  38  10-AGAGAGTGTGTG 3.916992    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    29  38  12-AGAGTGTGTG   6.055752    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    31  38  5-CACTCACT  5.516073    -
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    47  56  12-AGAGTGTGTG   6.055752    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    49  56  5-CACTCACT  5.516073    -

Any insights would be really appreciated. Thank you in advance!

Homer software-error UCSC • 874 views

ADD COMMENT • link updated 7 months ago by Pierre Lindenbaum 164k • written 7 months ago by Ronin ▴ 10

score 2 · Answer 1 · 2024-03-20

2

Entering edit mode

7 months ago

Pierre Lindenbaum 164k

wrong column delimiter for sort. you used

sort -k1,1 -k2,2n scanned-eye-motifsites.bed > scanned-eye-motifsites.sorted.bed

but need:

sort -k1,1 -k2,2n -t $'\t' scanned-eye-motifsites.bed > scanned-eye-motifsites.sorted.bed

ADD COMMENT • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks for the reply. I had thought this might work, but something odd has appeared. The code you corrected me with (thank you again) unfortunately did not work. I went ahead and just tried to open the bed file in VScode and saw this:

bed file screenshot

So it looks like the row shows the chromosome, with simple spaces extending from CM020909.1 to the word 'sequence', followed by a tab to the number 4 there. It's a bit difficult to see (sorry about that!), but you can see the dots that are used to show a space, and also the tab arrow symbols.

Do you think that this might explain why this sort command did not work? And if so, is there a simple workaround to remove this information so the code might work?

ADD REPLY • link 7 months ago by Ronin ▴ 10

0

Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

ADD REPLY • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

It's a bit difficult to see

use tr to convert the characters to something visible. Don't use a gui like VScode to handle those files. Use the command line.

but you can see the dots

I don't see anything

ADD REPLY • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Ah, apologies. Here is what I see:

CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    4   11  17-TTCTTTTT 6.933043    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    9   16  17-TTCTTTTT 6.933043    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    10  21  13-TTTTTTTTTTTT 11.062214   +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    12  21  18-TTTTTTTTTT   9.703866    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    14  21  17-TTCTTTTT 7.368553    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    27  38  10-AGAGAGTGTGTG 3.916992    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    29  38  12-AGAGTGTGTG   6.055752    +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    31  38  5-CACTCACT  5.516073    -
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence    47  56  12-AGAGTGTGTG   6.055752    +

Does that help?

ADD REPLY • link 7 months ago by Ronin ▴ 10

0

Entering edit mode

from what you pasted, there is not tab in your file.

ADD REPLY • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hmm.. you may be right. Is there any way I can fix this issue easily still, you think? When I try to use VScode to remove the text portion, it gives an error message saying that it cannot remove that many lines of text...

ADD REPLY • link 7 months ago by Ronin ▴ 10

0

Entering edit mode

I can fix this issue easily still, you think?

yes, use sed. At this point I stop helping you.

ADD REPLY • link 7 months ago by Pierre Lindenbaum 164k

score 0 · Answer 2 · 2024-03-20

0

Entering edit mode

7 months ago

inedraylig ▴ 70

The bed file doesn't follow the BED format: https://genome.ucsc.edu/FAQ/FAQformat.html#format1

It should be a tab-separated file where the first columns is the chromosome, second and third are the coordinates for start and end of the feature.

ADD COMMENT • link 7 months ago by inedraylig ▴ 70

1

Entering edit mode

it follows the bed format if the delimiter in the chrom name is a space.

ADD REPLY • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Correct, my reply was less thorough than yours (I assumed that the delimiters got lost in the sort command, but didn't look specifically).

ADD REPLY • link 7 months ago by inedraylig ▴ 70