LASTZ does not accept/like "-" (dash) in Fasta file
1
0
Entering edit mode
7.5 years ago
biomagician ▴ 410

Dear all,

I have tried the following command:

lastz output/genome/ref/seq/chromosome.I.fa output/assembly/celegans/hgap/bristol/contig.fa --notransition --step=20 --nogapped --format=maf > output/AvsB.maf

but LASTZ complains about the dash in my fasta file:

FAILURE: bad fasta character in output/assembly/celegans/hgap/bristol/contig.fa, >contig: -

Does anybody know how to get round this problem?

Best,

C.

lastz fasta alignment • 3.5k views
ADD COMMENT
1
Entering edit mode

Can you post a couple examples of your header lines?

This will remove all dashes: sed -i 's/-//g' myfasta.fa

ADD REPLY
0
Entering edit mode

What are header lines? Do you mean the header of the FASTA file? There is only one sequence in the file, so only one header: >contig

ADD REPLY
1
Entering edit mode

-i in sed will make a change in the original file, I'd rather go without -i for this case because we dont know what is needed to be done specifically.

why is this post in job section anyways?

ADD REPLY
0
Entering edit mode

Just moved it into the 'Question' section :)

ADD REPLY
0
Entering edit mode

That's why I asked to see some header lines, unless the dash is in the sequences. The OP can also make a copy of the file, or sub-sample to test on.

ADD REPLY
0
Entering edit mode

The output of sed is too long, is this what you want?:

>contig
ggaatattgttatttcagaaaatagctaaatgtgatttctgtaattttgcctgccaaatt
cgtgaaatgcaataaaaatctaatatccctcatcagtgcgatttccgaatcagtatattt
ttacgtaatagcttctttgacatcaataagtatttgcctatatgactttagacttgaaat
tggctattaatgccaatttcatgatatctagccactttagtataattgtttttagttttt
ggcaaaactattgtctaaacagatattcgtgttttcaannnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnacggccttgatgtgcnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnaggctgacgannnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnctttgnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn--------------------
------------------------------------------------------------
------------------------------------------------------------

and it goes on for much longer.

ADD REPLY
2
Entering edit mode

Well how did those '-' end up there in the first place...

ADD REPLY
0
Entering edit mode

It's the output of the HGAP assembler. I don't know why HGAP returns gaps in a contig.

ADD REPLY
1
Entering edit mode

You can use sed to replace the file first, sed -e '/^[agctn]/ s/-/n/g' file1.fa >file2.fa

Additionally you have to use the --ambiguous=n and query.fa[unmask] options since you have lower-case letters.

ADD REPLY
0
Entering edit mode

The output of sed is too long, is this what you want?

you need to put it into a new file, sed -i 's/-//g' myfasta.fa > nodash.fasta, otherwise it prints to stdout.

ADD REPLY
0
Entering edit mode

it prints to stdout.

Not with -i ...

ADD REPLY
0
Entering edit mode

I agree, this would destroy the original file - horrible experiences with sed -i and grep > (instead of grep '>')

ADD REPLY
1
Entering edit mode
7.5 years ago
biomagician ▴ 410

Thanks to all. In the end, I revised my HGAP pipeline and the output of HGAP does not contain dashes, so I don't have this problem anymore. I was able to create a MAF file for visualisation in GMAJ. Thanks. C.

ADD COMMENT

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6