velveth (fasta does not seem to be in FastA format)
4
0
Entering edit mode
8.6 years ago
kelvinfrog75 ▴ 10

Hey, I am trying to run velveth on my mac but keep getting this error message "F1.fasta does not seem to be in FastA format" which I have no idea why since F1.fasta is a fasta file. Here is my code. Anyone might have an idea what is going on?? both F1.fasta and F2.fasta are fasta files and the terminal directory has these two files.

velveth Assem 31 -fasta -shortPaired F1.fasta F2.fasta
Assembly • 3.3k views
ADD COMMENT
0
Entering edit mode

Show a head of your files to provide further information

ADD REPLY
0
Entering edit mode

Actually, I think I pasted something wrong. Here is the sequence head again. F1.fasta

>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/1
NGCGAGGCTTCCATCAGTGAAATGTTTCCTTTCTGTTGTTGAAGTTTCATCTCAGCCAGAAGGCGCTCCAACGAAGTTATTTCTTTTTCATAAACAGCCA
>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/2
ACAAGGAAATGGCCGGATATCAGTTTCAGGAAATCATGCGCACCTTGCATAGTGAGCTGAACGAACGATTTGTCGAGACTTATTTTCTGACTAGAAATAT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/1
NGAAGCGTGACAAAATCACGTACAATACTCAGACTACCTCCGCCACCTGAGAAGCTCATATCCGGATAATCCACTTGATATAAATGTCCGAAAATGCGTT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/2
TCGGGAAATGCTGGAAATCAGAGTGGCTGATACAGGGATCGGAATTAAAAAAGAAGACAGAGAACGCATTTTTGGACATTTTTATCAAGTGGTTTATCCT
>SRR041654.3 HWI-EAS284_61BKE:5:1:2:1671/1
NGGTGTGTCTGTATTGCTGTCTGCCGTAACGGTAATTTTCCTGATTTCGGCAACTATCATTGTTTTTACTCCTTTACGTAATTATTTGCCGGGATATATG

F2.fasta

>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/1
NAAATCAGACAAATCTCCGTTATTGGTATATACTTTGGGAGTGTTATGGAATTGCACACCCATTTCGAACATGAAGCCAATTCGTTTCTTAGGAATCGCT
>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/2
GAAATCGGAAACTATCGTATACCTGTAGATCAGAACGGAAATATATCTGGTGGTTTGAAGGTTTCTTCATTCCGTCCTTATCTTGGACTAGGCTTCGGAA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/1
NATGTAGCATTAAAAATTACATCCTAAACTTATCGATAAATGAGTACGCCCATCATAATCATAGTCAGAGGTATTTACACGATCGAATACAACTTTTGCA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/2
TTACAACAAGGGCTGACTAATTATTATACCTGTGATTACTATCGTATTGGCGGGGCGATAAAGGATTTGCAAAAAAAAAAAGAAAAAAGAAGAAAGAGAA
>SRR041655.3 HWI-EAS284_61BKE:6:1:2:1293/1
NACAAGCTGATTAAGCCTATAAATAAGACCTTTATTTTCCCCATCTGAAATAACTCGAATCCTCCTATCAGTTGCATAACTTAAAGCAATTTCTAAGGAA
ADD REPLY
0
Entering edit mode

Do the following

cat  F1.fasta | grep ">"  | wc -l

And the same with the other file, and give us the numbers. This counts the number of reads, and will see if both files have the same number

ADD REPLY
1
Entering edit mode
8.6 years ago
kelvinfrog75 ▴ 10

Hey, do you think it has something to do with the type of text file. I have a MAC and my computer indicate the kind of file is "TextEdit Document". I suspect it can be a problem since this code works fine for me before. But between last time I used it and now, I think I downloaded some text file apps on my computer and it changes the kind of text file....

ADD COMMENT
1
Entering edit mode
8.6 years ago

Its possible you've gotten the wrong formatting for the endlines somehow.

If you do:

cat -v F1.fasta

Do you see ^M anywhere (at the end of the lines)?

ADD COMMENT
0
Entering edit mode

I tried but did not see ^M.

ADD REPLY
0
Entering edit mode

If you try to run something like

velveth myOutputdirectory/ 31 -fasta -shortPaired F1.fasta F2.fasta

Do you still get the same problem?

ADD REPLY
0
Entering edit mode
8.6 years ago
kelvinfrog75 ▴ 10

Do you think it is the N messing it up? if so, It is weird since they are the tutorial sequences.

 > head F1.fasta
    > >SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/1 NGCGAGGCTTCCATCAGTGAAATGTTTCCTTTCTGTTGTTGAAGTTTCATCTCAGCCAGAAGGCGCTCCAACGAAGTTATTTCTTTTTCATAAACAGCCA
    > >SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/2 ACAAGGAAATGGCCGGATATCAGTTTCAGGAAATCATGCGCACCTTGCATAGTGAGCTGAACGAACGATTTGTCGAGACTTATTTTCTGACTAGAAATAT
    > >SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/1 NGAAGCGTGACAAAATCACGTACAATACTCAGACTACCTCCGCCACCTGAGAAGCTCATATCCGGATAATCCACTTGATATAAATGTCCGAAAATGCGTT
    > >SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/2 TCGGGAAATGCTGGAAATCAGAGTGGCTGATACAGGGATCGGAATTAAAAAAGAAGACAGAGAACGCATTTTTGGACATTTTTATCAAGTGGTTTATCCT
    > >SRR041654.3 HWI-EAS284_61BKE:5:1:2:1671/1 NGGTGTGTCTGTATTGCTGTCTGCCGTAACGGTAATTTTCCTGATTTCGGCAACTATCATTGTTTTTACTCCTTTACGTAATTATTTGCCGGGATATATG
    head F2.fasta
    >SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/1
    NAAATCAGACAAATCTCCGTTATTGGTATATACTTTGGGAGTGTTATGGAATTGCACACCCATTTCGAACATGAAGCCAATTCGTTTCTTAGGAATCGCT
    >SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/2
    GAAATCGGAAACTATCGTATACCTGTAGATCAGAACGGAAATATATCTGGTGGTTTGAAGGTTTCTTCATTCCGTCCTTATCTTGGACTAGGCTTCGGAA
    >SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/1
    NATGTAGCATTAAAAATTACATCCTAAACTTATCGATAAATGAGTACGCCCATCATAATCATAGTCAGAGGTATTTACACGATCGAATACAACTTTTGCA
    >SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/2
    TTACAACAAGGGCTGACTAATTATTATACCTGTGATTACTATCGTATTGGCGGGGCGATAAAGGATTTGCAAAAAAAAAAAGAAAAAAGAAGAAAGAGAA
    >SRR041655.3 HWI-EAS284_61BKE:6:1:2:1293/1
    NACAAGCTGATTAAGCCTATAAATAAGACCTTTATTTTCCCCATCTGAAATAACTCGAATCCTCCTATCAGTTGCATAACTTAAAGCAATTTCTAAGGAA
ADD COMMENT
0
Entering edit mode

F1.fasta looks very strange, while F2.fasta looks OK. Is this just formatting from this system or does F1.fasta really have all sequences in one line?

ADD REPLY
0
Entering edit mode

The first sequence is wrong. You have two ">" at the beginning, and you should have only one. In addition, after the >"name" and the comment (the word starting with HWI..) you must have a carriage return to show the fasta sequence in a second lane

In other words. You must get exactly waht you see in F2.fasta

Run this code to fix it

$ cat F1.fasta | awk '{print $2 " " $3 "\n" $4}' > F1_fixed.fasta

Notice the empty spaces. Then, after checking the file, erase the wrong one and rename F1_fixed.fasta as the original, or use that name in velvet directly

You can also run it in this way

$ awk '{print $2 " " $3 "\n" $4}' F1.fasta > F1_fixed.fasta

In both cases, you obtain the correct fasta file

If you notice that the double ">" is not in the original file, take into account that print $2 means print the second word, $3, the third, etc, and then change the code in accordance. "\n" means a carriage return

ADD REPLY
0
Entering edit mode
8.6 years ago
kelvinfrog75 ▴ 10

So I think both of them is ok but I will wonder if the error is due to the "N" in the fasta file. I made some mistake in my previous post. F1.fasta is like this

>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/1
NGCGAGGCTTCCATCAGTGAAATGTTTCCTTTCTGTTGTTGAAGTTTCATCTCAGCCAGAAGGCGCTCCAACGAAGTTATTTCTTTTTCATAAACAGCCA
>SRR041654.1 HWI-EAS284_61BKE:5:1:2:1334/2
ACAAGGAAATGGCCGGATATCAGTTTCAGGAAATCATGCGCACCTTGCATAGTGAGCTGAACGAACGATTTGTCGAGACTTATTTTCTGACTAGAAATAT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/1
NGAAGCGTGACAAAATCACGTACAATACTCAGACTACCTCCGCCACCTGAGAAGCTCATATCCGGATAATCCACTTGATATAAATGTCCGAAAATGCGTT
>SRR041654.2 HWI-EAS284_61BKE:5:1:2:1511/2
TCGGGAAATGCTGGAAATCAGAGTGGCTGATACAGGGATCGGAATTAAAAAAGAAGACAGAGAACGCATTTTTGGACATTTTTATCAAGTGGTTTATCCT
>SRR041654.3 HWI-EAS284_61BKE:5:1:2:1671/1
NGGTGTGTCTGTATTGCTGTCTGCCGTAACGGTAATTTTCCTGATTTCGGCAACTATCATTGTTTTTACTCCTTTACGTAATTATTTGCCGGGATATATG

and F2.fasta is like this

>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/1
NAAATCAGACAAATCTCCGTTATTGGTATATACTTTGGGAGTGTTATGGAATTGCACACCCATTTCGAACATGAAGCCAATTCGTTTCTTAGGAATCGCT
>SRR041655.1 HWI-EAS284_61BKE:6:1:2:1735/2
GAAATCGGAAACTATCGTATACCTGTAGATCAGAACGGAAATATATCTGGTGGTTTGAAGGTTTCTTCATTCCGTCCTTATCTTGGACTAGGCTTCGGAA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/1
NATGTAGCATTAAAAATTACATCCTAAACTTATCGATAAATGAGTACGCCCATCATAATCATAGTCAGAGGTATTTACACGATCGAATACAACTTTTGCA
>SRR041655.2 HWI-EAS284_61BKE:6:1:2:1264/2
TTACAACAAGGGCTGACTAATTATTATACCTGTGATTACTATCGTATTGGCGGGGCGATAAAGGATTTGCAAAAAAAAAAAGAAAAAAGAAGAAAGAGAA
>SRR041655.3 HWI-EAS284_61BKE:6:1:2:1293/1
NACAAGCTGATTAAGCCTATAAATAAGACCTTTATTTTCCCCATCTGAAATAACTCGAATCCTCCTATCAGTTGCATAACTTAAAGCAATTTCTAAGGAA
ADD COMMENT
1
Entering edit mode

The presence of N is not an issue..

ADD REPLY
1
Entering edit mode
velveth Assem 31 -fasta -shortPaired test1.fasta test2.fasta

works perfectly fine for me with those two sequences. Check the end of your files with tail, maybe there's some error hiding there.

ADD REPLY

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6