FastQC: per tile sequence quality after using filterbytile.sh
1
1
Entering edit mode
6.4 years ago
Nagesh ▴ 10

Hello, I have tried filterbytile.sh to improve the quality of library. Though the Per base sequence quality is above 30 PHRED score, per tile sequence quality is gone bad at right side of the reads. So, is there any other way to trim the the bad quality bases at one end to get the better per tile sequence quality. Thanks in advance.

Here I have attached the image of per tile sequence quality after filterbytile.sh filteration. f1

Output of trimmomatic tool was used as input to filterbytile.sh

next-gen sequencing • 5.1k views
ADD COMMENT
0
Entering edit mode

Unless you have a real good reason you should not need to use filterbytile.sh with recent data. What was the logic behind using it in this instance. We seem to be missing the complete picture here as hinted by @h.mon.

ADD REPLY
0
Entering edit mode

The data which I am having is for a microbial genome and trying to improve the data quality as much as good. I would like to see whether the genome assembly will be better or not.

ADD REPLY
0
Entering edit mode

Figure I have given above is after filterbytile.sh filteration with the following command filterbytile.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=f1.fq out2=f2.fq trimq=1 qtrim=rl lowqualityonly=f ud=0.75 qd=1 ed=1 ua=.5 qa=.5 ea=.5

This is the raw data per tile quality figure f1

There is no much changes from trimmomatic to filterbytile.sh

Trimmomatic command I used along with the SLIDINGWINDOW:2:30 MINLEN:20 LEADING:30 TRAILING:30 parameters.

ADD REPLY
2
Entering edit mode

I think there was no need to use filterbytile.sh in this case. I suggest that if you are staying with BBMap suite then use bbduk.sh to scan and trim your data to remove any extraneous sequence. Since you want to do de novo assembly you should also quality filter at Q20. But that should be all you need before going into SPAdes or a similar assembler. If you have really deep coverage then doing normalization of reads may be needed.

ADD REPLY
0
Entering edit mode

You are using very strict settings, for when you know there’s a serious problem (ref: Introducing FilterByTile: Remove Low-Quality Reads Without Adding Bias), however, the picture you linked does not indicate any serious tile problems.

ADD REPLY
0
Entering edit mode
6.4 years ago
h.mon 35k

Is the figure you posted from before or after filterbytile.sh? And before or after Trimmomatic? Did you run FastQC with the raw reads, then after each pre-proccessing step?

You don't have systematic tile bad quality, you have the often seen decrease in quality associated with sequencing cycles. If the figure was generated after Trimmomatic filtering, you may have to change your settings - what was the command-line you used? If the figure is from before Trimmomatic filtering, run FastQC again and compare.

ADD COMMENT
0
Entering edit mode

OK, I'll pick in on this. I have the following per_tile_quality plot:

per_tile_quality

This is for the R2 from paired end data, after quality trimming with BBduk. I was (to be honest) not even aware of this per_tile issue, so my question is: should I be doing filterByTile.sh here?

ADD REPLY
1
Entering edit mode

I am not sure what is going on in that block of tiles at the top right. You could examine some of those reads. It could just be an artefact of how FastQC is plotting that data. FastQC samples a fraction of the data for most of the parameters it checks and I am not sure how much data it uses for these plots.

That said most data I have seen of late rarely needs quality filtering (unless you are doing de novo work and want to be strict about quality).

ADD REPLY
0
Entering edit mode

thx for the insight genomax . btw, I plotted those values on a per base resolution so not the usual binning that fastQC applies, if that makes any difference.

Yes it is for denovo assembly purposes, so some strictness on the quality is desirable, though I usually don't take it into the extremes.

How would you 'select' reads that fall in that top right corner?

ADD REPLY
0
Entering edit mode

I was not referring to binning of cycles for plotting but down sampling of data during analysis. FastQC does not look at the entire dataset since that would take too much time/memory. Based on info from Dr. Simon Andrews per tile plot only tracks 10% of data while k-mer module uses only 2%.

You should be able to see the tile numbers up at top of Y axis and then grep for those reads in your fastq files.

ADD REPLY
0
Entering edit mode

OK, yes, I just added that as additional potential useful info

I see, ok, thanks, will have a look at that.

ADD REPLY

Login before adding your answer.

Traffic: 2197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6