Hello. I had a fastq file from an illumina machine and convert it in fasta. Then i used dos2unix command to fix any problem with new lines.
dos2unix merged_zea.fasta
after that I used the makeblastdb
command to create database to use it with tblastx.
makeblastdb -dbtype nucl -in merged_zea.fasta -input_type fasta -out zeaDB -max_file_sz 2GB
The problem is that makeblastdb
returns me this error:
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
I searched int net and everyone is talking about 2 possibilities.
- Some mistake in the fasta file
- There are enough NNNN in sequencies.
My sequencies has no NNNN and here is how they look like in fasta file
>M02381-6-000000000-AA1W0-1-1101-15670-1350 1-N-0-1
GAAGCAGTGTTGACGTAGTTCTGAGCCATGGCCATCATGTACTGAACATCGAGGTTAGCTTCAACACAAGAGTTGGGGTTGAAAGAGCAAGCACTGGGGTTGTTGGGTCCGATGACCTTGTCAACCTTGTCGGAAGCGGTTCCCATAGCAG
>M02381-6-000000000-AA1W0-1-1101-15485-1351 1-N-0-1
CCAGAAGTCAAGTCGAACAAAACGCCAATTGAGGATGGCGACGATCCCGACCCGCCAAGTAAGAATGCCACGCCAGAGGAGAAAGCTTCGGCCATCCGCGCAATACAAGACAACTCGGATGAGATGTTCGACGTTATTG
>M02381-6-000000000-AA1W0-1-1101-15538-1354 1-N-0-1
GGGAAGGACTGACTGCTATGAACGCGCTGTTTTATTCACGGTAGCGTACCGGCGCGCACCATTGCTTCACGTGTTGGGTTAGTGAAGAGGGCTCATCGAACGACCCAATTAAGGGCCCCTCTC
>M02381-6-000000000-AA1W0-1-1101-16188-1354 1-N-0-1
GGACAGTTGGGGGTGCTAGTATTTAACGGCCAGAGGTGAAATTCTTGGATTCGTTAAAGACTAACTAATGCGAAAGCATTCACCAAGGATGTCTTCTTTAATCAAGAACGAAAGTTGGGGGATCGAAGAGGATCAGATACCCTCGTAGT
>M02381-6-000000000-AA1W0-1-1101-15230-1355 1-N-0-1
GTGCACACACACGAATGCTCAACACCGCCCGGCCCTATTTGGGTTGAGGTAGAGTGTGCCTGTTTGGACCCGAAAGGTGGTGAACTATGCCTGAACAAGGTGAAGCCAGGGGAAACCCCGGTGGAGGCCTGTAG
>M02381-6-000000000-AA1W0-1-1101-16511-1355 1-N-0-1
ATACTGGAGTTCCCACAGCAATAGATAGATCATGGTCATAGTCATCATAATTATCTTTGCTTCCTGACCTGTTCCCAGAGGAGTCGTAGTC
>M02381-6-000000000-AA1W0-1-1101-14969-1356 1-N-0-1
GGTTTAAAGGGTCCGTAGGCGGTTTTATAAGTCAGTGGTGAAAGTTTGCGGCTTAACCGTAAAGTTGCCATTGATACTGTAGAACTTGAATAATTGTGAAGTGGTTAGAATAAGTAGTGTAGCGGTGAAATGCATAG
>M02381-6-000000000-AA1W0-1-1101-15132-1356 1-N-0-1
CAATTCCGTTGAATTCGAGGAAGGGATAATCCAAGCAATCAGACATTATTTCTTTTTGTTTGAGGTGTCCAAGGCTCCATTCACACTCCACCAGTTATTCCCGTTTTTGAAGGGTACCGACGTTCCCGATCAAGAGATACCCTCCACACT
>M02381-6-000000000-AA1W0-1-1101-15192-1356 1-N-0-1
CGTGAATCGTCTGGCGGCAGACGGCGTTGCCCTCGCGGGCAATGCGCACGATCGCGTAGTTGGCGGTGCAGCCGTCGGCCTCGCCCTGCTCGGTGGAGTTCACCTTCGGGTCCGAAAGGAGGGCGTCGAGGTCCGGCGAGTTGCACTCGG
>M02381-6-000000000-AA1W0-1-1101-16071-1357 1-N-0-1
GCTATGTATGTTGCTATTCAAGCTGTCCTTTCTCTTTATGCATCTGGACGTACAACTGGTATTGTTTTAGATACTGGTGATGGTGTTTCTCACACAGTCCCAATTTATGAAGGATATGCAC
>M02381-6-000000000-AA1W0-1-1101-15463-1357 1-N-0-1
GCTGTGAGCCATGTTGCGGTCAACCTTGCAACACTAGTGACGGATTATATTGCTGTCTTTCTTTCTTCCCTGGATGTCTTTGTGCTTACCCACAATTCTATGCTTCCACTCAAGAACAACAATGTGCTTGGGTCAACCACTGTGTCCCAA
>M02381-6-000000000-AA1W0-1-1101-16160-1357 1-N-0-1
TGTTTGAATTTCTTCACTTTGACATTCAGAGCACTGGGCAGAAATCACATTGTGTTAAAATCTTTTCATGACCATCACAATGCTTTGTTTTAATTAAACAGTCGGATTCCCCTGGTCAGTAACAGTTCTAAATTAGCTGTTCATTGTATA
>M02381-6-000000000-AA1W0-1-1101-15586-1357 1-N-0-1
GGATCATACTCTACCTCGCGCAAACTAGAGATAGGCACAAGACTATGGAAGTTTCGTTGGAAAGCGGCGCCAGAGAGGGTGAAAGCCCCGTGGATTAGATTTGTGTAGCGTGTGAGTTGGGGGTGGCCCCGAGCGAGTCGTGTTGTTTGGG
>M02381-6-000000000-AA1W0-1-1101-16490-1359 1-N-0-1
TTGCATACCCGATTGCCTTCTTAATGATCTCAACCCAGTTGGTCCTCTCCTCCTCTTTGAGAAGATAGAATGTTCTGGTCTTCGAAGGGAAGATCAGGGTGAACGGATAGAGAGTGTTGGAC
>M02381-6-000000000-AA1W0-1-1101-15381-1359 1-N-0-1
AGATACCTTTATGCCGGCTGCTGACTACTTATGCAGGATTGGCATTATGACTGCTTTGACTAGAAAACTCTGGAAAAGAAAAAAAATTGTAAGGCTTGGGCGCTTAGTGTCTCTT
So what produces this error? Can somebody give me a hint?
Thanks.
answer number 1 above is the very likely answer.
that your file is incorrect even though it may look correct to you -
dos2unix
won't fix all line ending problems - take a single line and see what happens. The file that you list above works fine on my system.I am taking a single line and it works. Maybe there is a specific line produces this error. But how can I "debug" a fasta file?