I am trying to use Homer's scanMotifGenomeWide.pl, and need to convert my bed file to a bigbed file.
scanMotifGenomeWide.pl /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/motifs/eye-motifs/homerMotifs.all.motifs /scratch/ATACseq/FASTQ_ATAC_43955/pfluv-genome/pfluv-genome.fa -bed > scanned-eye-motifsites.bed
sort -k1,1 -k2,2n scanned-eye-motifsites.bed > scanned-eye-motifsites.sorted.bed
However, when I try:
bedToBigBed scanned-eye-motifsites.sorted.bed /scratch/ATACseq/FASTQ_ATAC_43955/pfluv-genome/pfluv-genome.sizes scanned-eye-motifs.bigBed
I get the following error:
Expecting number field 2 line 1 of scanned-eye-motifsites.sorted.bed, got Perca
Expecting number field 2 line 1 of scanned-liver-motifsites.sorted.bed, got Perca
Expecting number field 2 line 1 of scanned-spleen-motifsites.sorted.bed, got Perca
Here is what I see when I try to use the head command on the bed file made from the sort step:
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 4 11 17-TTCTTTTT 6.933043 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 9 16 17-TTCTTTTT 6.933043 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 10 21 13-TTTTTTTTTTTT 11.062214 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 12 21 18-TTTTTTTTTT 9.703866 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 14 21 17-TTCTTTTT 7.368553 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 27 38 10-AGAGAGTGTGTG 3.916992 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 29 38 12-AGAGTGTGTG 6.055752 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 31 38 5-CACTCACT 5.516073 -
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 47 56 12-AGAGTGTGTG 6.055752 +
CM020909.1 Perca fluviatilis chromosome 1, whole genome shotgun sequence 49 56 5-CACTCACT 5.516073 -
Any insights would be really appreciated. Thank you in advance!
Thanks for the reply. I had thought this might work, but something odd has appeared. The code you corrected me with (thank you again) unfortunately did not work. I went ahead and just tried to open the bed file in VScode and saw this:
So it looks like the row shows the chromosome, with simple spaces extending from CM020909.1 to the word 'sequence', followed by a tab to the number 4 there. It's a bit difficult to see (sorry about that!), but you can see the dots that are used to show a space, and also the tab arrow symbols.
Do you think that this might explain why this sort command did not work? And if so, is there a simple workaround to remove this information so the code might work?
use
tr
to convert the characters to something visible. Don't use a gui like VScode to handle those files. Use the command line.I don't see anything
Ah, apologies. Here is what I see:
Does that help?
from what you pasted, there is not tab in your file.
Hmm.. you may be right. Is there any way I can fix this issue easily still, you think? When I try to use VScode to remove the text portion, it gives an error message saying that it cannot remove that many lines of text...
yes, use
sed
. At this point I stop helping you.