I have a FastQ file which appears to be a truncated one, due to which there is a formatting error, and I'm neither able to upload it in FASTQC software nor use that for aligning it with a reference Genome. The size of the file is around 16 GB. I want to view where the formatting style error occurred (most probably at the end of the file) and edit in such a way that it follows the correct format.
Doing it in a text editor is nearly impossible because the machine crashes if I try to open such a large file. Is there any other tool I can use to edit a small part of the FastQ file?
You can also try
validateFiles
from Jim Kent's utils to see where the problem is. Add execute permissionchmod u+x validateFiles
after you download.Yes, I tried this and located where the error has possibly occurred. I still need a way to correct it.
If you tell us what the error says we can try to help you fix it.
As far as I know, the BAM file which I have was generated from a different pipeline and when I try to convert to fastq, it results in a truncated file having format errors like:
1) ^B symbol is present in some places instead of breaking into a new line 2) few incomplete records 3) RNA sequence having characters like B, F, @ apart from ATCG