Tool:AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
4
12
Entering edit mode
8.6 years ago
chen ★ 2.5k

Hi, this tool may save your time, it do filtering and QC with fastq data automatically

following introduction is out of date and the newer AfterQC is much more powerful, please check the github page for update

AfterQC

project on github: https://github.com/OpenGene/AfterQC
sample report: http://opengene.org/AfterQC/report.html

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, X10, X5, Nextseq 500/550, MiniSeq...

Features:

AfterQC does following tasks automatically:

  • Filters reads with too low quality, too short length or too many N
  • Filters reads with abnormal PolyA/PolyT/PolyC/PolyG sequences
  • Does per-base quality control and plots the figures
  • Trims reads at front and tail, according to QC results
  • For pair-end sequencing data, AfterQC automatically corrects low quality wrong bases in overlapped area of read1/read2
  • Detects and eliminates bubble artifact caused by sequencer due to fluid dynamics issues
  • Single molecule barcode sequencing support: if all reads have a single molecule barcode (see duplex sequencing), AfterQC shifts the barcodes from the reads to the fastq query names
  • Support both single-end sequencing and pair-end sequencing data

Dependency:

AfterQC uses editdistance module, run following before using AfterQC:

pip install editdistance

WARNING: If you haven't installed editdistance module, AfterQC will use a python implementation of editdistance, but it will be extremely slow.

Simple usage:

1, Prepare your fastq files in a folder
2, For single-end sequencing, the filenames in the folder should be *R1*
For pair-end sequencing, the filenames in the folder should be *R1* and *R2*

cd /path/to/fastq/folder
python path/to/AfterQC/after.py

Two folders will be automatically generated, a folder 'good' stores the good reads and a folder 'bad' stores the bad reads
AfterQC will print some statistical information after it is done, such how many good reads, how many bad reads, and how many reads are corrected.

Quality Control only

If you only want to get quality control statistics, run:

python after.py --qc_only

Understand the report

  • AfterQC will generate a QC folder, which contains lots of figures.
  • For pair-end sequencing data, both read1 and read2 figures will be in the same folder with the folder name of read1's filename. R1 means read1, R2 means read2.
  • For single-end sequencing data, it will still have R1.
  • prefilter means before filtering, postfilter means after filtering
  • For pair-end sequencing data, After will do an overlap analysis. read1 and read2 will be overlapped when read1_length + read2_length > DNA_template_length.
PolyG Quality-Control Filtering Fastq AfterQC • 8.6k views
ADD COMMENT
0
Entering edit mode

Hello,

I've got a few questions about the calcs in AfterQC. In the AfterQC paper, you note that "AfterQC can detect the mismatches in the overlapping regions. For those reads with very long overlap (i.e. overlap_len>50)".

In the estimated seq error field in the html report, are only overlaps greater than 50bp considered? And are the errors in these overlaps the only component that goes into the seq error rate calculation?

If only overlaps greater than 50bp go into the calculation, could you please let me know where should I change the source to modify that number (my guess is complete_compare_require in util.py)?

Thanks very much for the software!

ADD REPLY
0
Entering edit mode

Please don't post new questions in the answer section. New Questions need to be asked separately. This post will be moved to a comment.

ADD REPLY
2
Entering edit mode
8.6 years ago
biomaster ▴ 180

Hey bro, I know you were doing Ads for your Github project, but your codes did save my day! Your tool helps me to get rid of the damn polyG errors of NextSeq 500 data!

Thanks man, good project!

ADD COMMENT
2
Entering edit mode

wow, glad to know that AfterQC helps.

ADD REPLY
0
Entering edit mode
7.6 years ago
bioinfo8 ▴ 230

Hi Chen,

'AfterQC' seems to be a wonderful tool which I would like to use for my data. I was wondering if there is any way to use it inside R?

Thanks!

ADD COMMENT
1
Entering edit mode

No R implementation yet.

But using it with Python or Pypy is very simple, you can get started in less than 3 minutes.

ADD REPLY
0
Entering edit mode

Ok, thanks.How much RAM would you recommend to run AfterQC on paired reads of one sample each of ~ 6GB (R1 ~6GB and R2 ~6GB)?

ADD REPLY
1
Entering edit mode

Actually there is no RAM requirement for AfterQC.

AfterQC uses very few RAM, 4GB RAM is quite enough.

ADD REPLY
0
Entering edit mode

This is for one sample?

ADD REPLY
1
Entering edit mode

I meant a 4GB systemm is enough to run AfterQC.

If you want to run too many samples concurrently, a bit more memory may be required.

For example, a 16 GB system should be good with running 20 samples concurrently.

ADD REPLY
0
Entering edit mode

I am trying to run afterQC as i saw here it needs only 4 gb RAM. but I am taking memory failed error. (I am using 250 gb SSD and also 1 tb HDD and RAM 8gb)

ADD REPLY
0
Entering edit mode
7.4 years ago
chen ★ 2.5k

AfterQC v0.9.4 was just released, now by using PyPy, it is 3X faster than previous versions.

ADD COMMENT
0
Entering edit mode
6.9 years ago
zhimenggan • 0

for AfterQC, how to run it in batch mode with multiprocess support

ADD COMMENT
0
Entering edit mode

AfterQC doesn't support multi-threading since it's in Python, you can use another tool I developed, which is much faster and more powerful -- fastp

ADD REPLY

Login before adding your answer.

Traffic: 2343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6