Hello everyone!
I'm using bedtools maskfasta in a pipeline that I run on many genomes.
I have 2 genomes that don't run.. They are large plant genomes, and I had to cut some of the chromosome in 2 (after 2Go, that was the limit of my other tool) to run another analysis in which I identify transposable elements. Everything is formatted the same as you can see
My genome identifiers: genome.fasta
>Ptabu_chr_10
>Ptabu_chr_1-1
>Ptabu_chr_11
>Ptabu_chr_1-2
>Ptabu_chr_12
>Ptabu_chr_2-1
>Ptabu_chr_2-2
>Ptabu_chr_3-1
>Ptabu_chr_3-2
>Ptabu_chr_4-1
>Ptabu_chr_4-2
>Ptabu_chr_5-1
>Ptabu_chr_5-2
>Ptabu_chr_6
>Ptabu_chr_7
>Ptabu_chr_8
>Ptabu_chr_9
And an example of my gff file: sirevirus.fasta
This file was produced on the previous genome file, so the elements found are within the range of the chromosomes.
Ptabu_chr_11 MASiVE Sirevirus 1647309169 1647317981 . - . ID=Ptabu_chr_11-P-1647309169;length=8813;age=1.9846
Ptabu_chr_11 MASiVE Sirevirus 1647574177 1647583012 . - . ID=Ptabu_chr_11-P-1647574177;length=8836;age=0.1308
Ptabu_chr_11 MASiVE Sirevirus 1648591185 1648599878 . - . ID=Ptabu_chr_11-P-1648591185;length=8694;age=0.2808
Ptabu_chr_1-2 MASiVE Sirevirus 1385320 1399673 . + . ID=Ptabu_chr_1-2-D-1385320;length=14354;age=2.0154
Ptabu_chr_1-2 MASiVE Sirevirus 1698108 1705751 . + . ID=Ptabu_chr_1-2-D-1698108;length=7644;age=0.2192
Ptabu_chr_1-2 MASiVE Sirevirus 5246903 5255035 . + . ID=Ptabu_chr_1-2-D-5246903;length=8133;age=0.4846
The command line:
bedtools maskfasta -fi ../genome.fasta -bed ../sirevirus.fasta -fo genome.fasta_all_fulllength_masked.fasta
The error I get is:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 3505698666) > this->size() (which is 2000000000)
I ran only chr 5-2 of this species as a test and there was no issue. The output created by maskfasta only appends 2 chromosomes before it crashes. The position in the error message '3505698666' doesn't exist in my data, no chromosome is that big.
I have the same issue on the garlic genome which is 14Go and one chromosome was cut, but it runs perfectly fine in wheat which is also 14Go but with smaller chromosomes.
If anyone can help me, I would greatly appreciate it!
Thank you :)
what is the longest position in the bed/gff ? what is the longest chromosome in the fasta ?
also, if you have
gdb
(GNU debugger) please, runwait for the error, type
backtrace
and show us the outputHi, thanks for your reply
length chr
longest position gff
Output of the gdb:
gdb creashed itself ? if no, please add type backtrace after the error and show us the output
I don't think it crashed. Just stopped at the same point it's been stopping when trying to run it.
not sure if this is the expected output
sadly, bedtools was compiled without debugging option. there is not useful trace here.
the lengths look ok...