Entering edit mode
4.9 years ago
xiaoleiusc
▴
140
Hi All,
What tool or program should I run to check "Per Base Sequence Content " of a fasta format file from NGS sequencing ? I know FastQC could check "Per Base Sequence Content " of a fastq file, but unfortunately FastQC does not take a fasta format as input.
Thanks ahead,
Xiao
Image link of an example of "Per Base Sequence Content " of a fastq format: https://photos.app.goo.gl/DuDEiXRUvT9x8Gkn6
What is in the fasta file? Are these reads or transcripts or something else entirely?
Thanks, It's a fasta file from Illumina sequencing but without quality data (processed fastq file, pcr duplications are collapsed).
Just add fake qualities then.
Thanks, I see. I could look for a tool to add fake qualities. But I noticed that in FastQC, the tool could give you "per base content" of a short read (say in my example, the read length is 36 bp), if I have a long read (say 75 bp single read), the FastQC only give me "per base content" of each base at the beginning, but not each base at full length (e.g. instead of base 37, FastQC would show 37-38 in one column). So I wonder if other tools could do this.
If you look at the options you'll find that you can change how it bins things.
xiaoleiusc : You have to add following option to FastQC to disable binning.
Be aware that report file sizes will increase when you do this.
Thanks, it seems this option is only for command line, not in the GUI of FastQC
Run FastQC on the command line then. You should be able to do this on any OS.
I located the FastQC app on my MacOS, right click show package content, and do ./fastqc --nogroup, it works great. Thanks a lot Genomax!
Excellent. Thanks for confirmation.