Entering edit mode
8.3 years ago
nora
▴
40
hello, please help me when i tried to read quality of sequence by fastqc in galaxy interface i received a message that says: No known encodings with chars < 33 ( the fasta file was downlaoded from genbank and uniprot) thank you in advance
fasta files don't have quality lines. Can you paste an example of the input file?
well, http://www.ncbi.nlm.nih.gov/genome/?term=Lactobacillus%20fermentum%5BOrganism%5D&cmd=DetailsSearch the file was donlaoaded via this website
On this page there are no fastq format files (hence you can't use FastQC, as indicated by @Asaf). If you are actually using a fastq formatted file then provide a direct link for it.
i converted the fasta file to fastq (tabular lines were written as FASTQ reads)
See convert FASTA into FASTQ using linux for information about why converting FASTA to FASTQ will not give meaningful quality scores (which is most of what FastQC operates on). Additionally, most of the FastQC metrics are only useful if you have more than one read, or sequence, in your file. It sounds like you have one FASTA file and want some information about its nucleotide content maybe?
I wanted to know what the meaning of this message No known encodings with chars < 33 because when I used the full file I have not got a result but when I used just a part of the sequence the fastqc tool works
Did you see the link to Wikipedia article on FastQ format that @Ido provided in his answer below?
BTW: How did you convert the fasta to fastq (did you use your own code)? Tools I know of (
reformat.sh
from BBMap) generally set all Q-scores to a fixed fake value for all bases.i converted fasta file to tabular to fastq in galaxy interface