bedtool getfasta only reads the first line of my bedfile
0
0
Entering edit mode
5.2 years ago

Hi all, hope you can help.

I have a multifasta file containing genomes of 730something procaryotic genomes (5-6Mb); all contigs/chromosomes are named in headers (as so)

>CP009335.1 Bacillus thuringiensis strain HD1011, complete genome
TCCTGATGGAACTTTAATTGATGAAAAGAGTCGTGTAAACTTTTTCCATCTTTCAACCCATCAATCATGC
GCTGCAATTGTACTTTCTTTTCTAAAGGTAATTGAAACCGTAAAAATTCTAATGCCTGCAAAAGGGAGTA
TCCTTTTTCTAATAGTTCTCCTAATCGTTTCAGTAATATGACTTGATCACTTAAACTCCATATTTCCTTA
AACATAAACATCTTCTTCTAAAAACCCTAAAGCGTATCCTTTTCGTATCGAAGATTGTAATGTTTCGTGC
TTGTATGTGACACATTCCCCGTTTGCTTCTTTAATCGCTTGTTTTAACTCATATCCATATAACAACTCAT
AAATACTCGCTTGCCTTACTTGCCTCATTGATTT

I have a bedfile created orginally in excel, saved as tabdelimited, converted using dos2unix and contains one line for each genome, like so:

CP009335.1  1984592 1992438 CP009335.1_genome.tsv   B_thuringesis
CP009720.1  3944559 3952406 CP009720.1_genome.tsv   B_thuringesis
ABDL02000007.1  228801  234535  GCA_000171035.2_ASM17103v2_genomic.tsv  B_cereus
CP026376.1  1520664 1528500 GCA_002952815.1_ASM295281v1_genomic.tsv B_cereus
NZ_CM000714.1   757305  765101  GCF_000003645.1_ASM364v1_genomic.tsv    B_cereus

I ran the following codes on the bed file to make sure it was tab delimited:

awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}' "mybedfile"

And when I manually check it, it looks ok.

When I run bedtools getfasta, I only get results from my top row. No error messages or nothing, just one result. I tried to copy paste some rows from the long bed file to a new file, and then it worked after manually editing in tabs and such. But I am hoping to avoid that. So, can anybody help med make bedtools read the whole bed file, and if there is something wrong with the bedfile (which seems to be the issue) how can I make it good and tab delimited?

THANK YOU

sequence gene • 1.4k views
ADD COMMENT
0
Entering edit mode

Hello ann-katrin.llarena and welcome to biostars.

I tried to copy paste some rows from the long bed file to a new file, and then it worked after manually editing in tabs and such.

That's a good proof that you original bed file isn't complete tab separated. You can convert als spaces to tabs using sed

sed 's/ \+/\t/g' input.bed > fixed.bed

fin swimmer

ADD REPLY
0
Entering edit mode

Thank you for helping, but still it just gives out the hit for the first row in the bed file and ignores the remainder of the file. Eh...any other takes on the issue?

ADD REPLY
0
Entering edit mode

Could you please upload an extract of your bed file that doesn't work to somewhere, so we can take a closer look on that?

Also please show the exact bedtoools command you were using.

Thanks.

ADD REPLY
0
Entering edit mode

Did you prepare the file in Windows? If so, the line terminators might be wrong, and you can use dos2unix to fix your file.

ADD REPLY
0
Entering edit mode

Thank you, I already did. It belongs to the story that I made in on mac - excel and in the file it says "converted from mac format"

ADD REPLY

Login before adding your answer.

Traffic: 1772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6