Entering edit mode
2.0 years ago
영재
•
0
Happy New Year! This time, I am starting a bioinformatics-related work as a new job.
I want to write a python sciprt that opens the fastq.gz file and calculates the sum of each A, T, G, C base in the sequence. Given read1, read2 fastq.gz, how to get each stat of read1, read2?
In summary, I would like to obtain the following results through a python script.
Below picture is the resulting output I want.
What is the question here? You know what you want to implement and which language to implement it in. So start trying it out and then post your code if you run into issues.
This has been implemented in existing packages/tools/commandline hacks already but if you are trying to learn to do this then go for it.
Following list of past threads on this topic is for reference:
Extract basic info from fastq files in an efficient way?
Counting Number Of Bases In A Fastq File
Number of bases with a certain quality in FASTQ file