any tools to get clean reads for illumina reads
1
0
Entering edit mode
7.5 years ago
J.F.Jiang ▴ 930

Hi all,

I am dealing with DNA sequencing data for couple of samples. The data we received is called as PF data.

I was told these PF data can be processed to clean data, by removing reads with many N or low quality.

I am wondering if there is existing tools that can I can use to get this clean data?

Thanks,

Junfeng

sequence clean reads • 4.0k views
ADD COMMENT
1
Entering edit mode

Did you check this thread, especially you might be interested in "prinSeq"

Looking For Reliable Tools To Do Quality Filtering Of Fastq Files

ADD REPLY
1
Entering edit mode

Depending on what you want to do with the data, chances are that you can just use it as-is without any "cleaning".

ADD REPLY
0
Entering edit mode

We performe the sequencing on XTen, getting Q30 around 82%, which was expected larger than 90%.

ADD REPLY
1
Entering edit mode

The BBMap Clumpify tool has been very useful for getting rid of Illumina platform-specific optical duplicates and tile-edge duplicates.

ADD REPLY
0
Entering edit mode

Hello!

You can use the command-line tools sikle and seqtk for trimming you files by quality. Then, you can visualize your trimmed vs non-trimmed data using the Bioconductor package qrqc.

You can check their manuals to figure out how they work (they are not difficult for simple tasks).

ADD REPLY
2
Entering edit mode
7.5 years ago
GenoMax 148k

Check your data with FastQC. Now-a-days it would be rare to get data with many N's (specially if it is PF data as you state above). There are many scan/trim programs (bbduk from BBMap, trimmomatic, cutadapt etc) that can be used to "clean" the data by removing illumina adapter sequences, extreme low quality bases/N's.

ADD COMMENT
0
Entering edit mode

ALL these tools are designed to trim the reads with low quality or other issues, but not directly remove the reads entirely.

ADD REPLY
0
Entering edit mode

With the right combination of trimming options these tools will remove reads completely that fail a quality criteria (e.g. if after trimming read becomes shorter than "n" bases). Why are you worried about that, by the way?

ADD REPLY
0
Entering edit mode

We recently repeat the same sample in different run to get the reproducibilty of the panel. However, we find different genotypes for several variants due to slightly change of ABRatio, which was defined as the ratio of alternative allele. For example, if the abratio = 0.06, Hom_Ref was assigned by GATK, while Het for 0.07. I am confused, and was told it might due to the sequencing error based on average Q30 around 80%.

ADD REPLY

Login before adding your answer.

Traffic: 3263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6