Entering edit mode
5.6 years ago
hafiz.talhamalik
▴
350
my fastq file looks like this. What's wrong with this ? I mean usually fastq file starts with@ sign and after sequence we have + sign and then quality score. if someone could help me which format is this and how can I convert this to normal fastq file ?
A00183:232:H5Y5JDSXX:3:1101:25437:1016 1:N:0:CGGATTGC+GAGTTAGC + ENST00000482771.1 262 GGGAAAAGCAGCCACCACATGATGCGGGAGAACCCAGAGCTGGTGGAGGGCCGTGACCTGCTGAGCTGCACCAGCTCTGAGCCTCTGACCCTCTGAGAGATGATGTCCTGCCCAGGCCCGATGGCCACTAGGACCCTGCAAGCAACTCTG FFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF,F:FFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 3
A00183:232:H5Y5JDSXX:3:1101:4444:1016 1:N:0:CGGATTGC+GAGTTAGC + ENST00000354694.11 692 GGGCAGCCCATCGTGTGGATCACTCCCTATGCCTTCTCCCATGACCACCCGACAGACGTGGACTACAGGGTCATGGCCACCTTCACCGAGTTCTACACCACGCTGCTGGGCTTTGTCAACTTCCGCCTTTACCAGTTGCTCAACCTCCAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF,FFFFFFFFFFFFF:FFFFFFFFF:FF:FFFFFFFFFFFFF 4
answer: it's not fastq
yeah I know. actually my file extension says it's a fastq file that's why I asked. Do you know which format is this ? any idea ?
File extensions don't carry any actual meaning, particularly in the Unix environment. A FASTQ file could just as easily be called a
.txt
, a.py
or any other, or indeed no extension at all. Extensions exist only by convention.The content of the file is what determines what type it is.
The real question isn't really what's wrong with your file, its "How has it ended up like this?". Where has the file come from? Has any upstream processing happened to it that you know of? It certainly looks like it could have been a FASTQ file, once upon a time...
Actually i dont know "How has it ended up like this?" a friend of mine processed that file few months ago and now he don't remember what he did and how this happened. Its appears to me a somewhat like .sam file without header. and most probably he renamed it or wrote a false name to that and fastq file got overwritten.
That is bad in terms of reproducibility. Always save code and log files of a job. If you have the original data better re-run the entire analysis.
this is the real problem. he used the original file without making copy of it.
If it is a SAM file, its probably best just to convert it back to FASTQ and start over. That SAM is not going to be much use to you if you don't know what reference its built around etc anyway.
Looks like part of
SAM
file. You have alignment information in there.input:
output:
output stats:
If you have the newer versions of samtools, just use
samtools fastq infile > outfile.fastq
You may need to convert the sam to a bam first.
cpad0112 That is more elegant of course. Good job
You can restore your fastq file using the code below. I split the commands in a step by step pipe so you can easily see what every command is doing. I am not sure if the FFFFFFFs you get are base quality from sequencer or alignment quality. But in any case this will not affect the realignment.
sed "s/^/@/g"
adds@
at the beggining of every line (the Seq ID always starts with@
)sed "s/ /_/"
replaces first space with_
, because I see that1:N:0******
is part of the name of your readssed "s/ \+/\t/g"
replaces spaces and more than one spaces with tab , so then usingcut -f 1,2,5,6
you can extract the necessary fastq columnssed "s/\t/\n/g"
finally split every tab in a new line.Your fastq file looks ready.