Contamination in my FASTQC files
0
0
Entering edit mode
8 months ago

HI, everyone. I hope someone can help me with this problem.

We have sequenced 5 bacterial genomes, (B1, B2, B3, B4 and B5) and when I ran fastqc into my third genome (B3), I got two peaks from GC content: contamination from another bacteria.

enter image description here

Well I could find that contamination come from my second strain and I have used bowtie2 to clean and get the reads from my third strain using the --un-conc command. I used the assembly from my B2 strain and a reference genome from NCBI to take out as much contamination reads as possible, but when I run again fastqc into my new "clean" fastqc files from B3, I get again two peaks in gc content, however now the second peaks is smaller. enter image description here

According to DFAST and tygs my B3 strain is possibly a Bacillus wiedmannii, I thought I can run bowtiw2 using this reference genome and save only the mapped reads. But I don't know, could I do this o what can I do?

Thanks in advance

Fastqc bowtie2 • 695 views
ADD COMMENT
0
Entering edit mode

Bacterial whole genome sequencing is not my field but I don't see why you should get two discrete peaks just by bacterial contamination. Can this be primer dimers or adapters that got sequenced? How about adapter content in fastqc and overrepresented seqs?

ADD REPLY
0
Entering edit mode

Hi, thanks for your answer.

Yes, my fastqc output indicates poliG (~0.1% of the library), but that was no a problem, i think. The real problem was the contamination with my B2 strains. I've worked with this problem throughout the day, i took out all reads with gc content > 47% (that was the min gc contect for my B2 fastq/reads) using reformat.sh option from BBtools (reformat.sh in=nohit.1 in2=nohit.2 -out=cg631 -out2=cg632 mingc=0 maxgc=0.47) The result was this: enter image description here

I think that is a good result, but i'm new in boinformatic tools, so i hope that.

ADD REPLY
0
Entering edit mode

Is your goal to compare the genomes? If you can't redo the sequencing, my advice would be to apply the same filter to all your genomes to avoid unnecessary bias and to ensure your filter doesn't excessively alter your results.

ADD REPLY
0
Entering edit mode

Hi, thanks . No, it is not my goal do a comparation among genomes, but i going to take care about filters

ADD REPLY
0
Entering edit mode

i took out all reads with gc content > 47% (that was the min gc contect for my B2 fastq/reads)

Hard to believe there is no single read with less than 47% GC in b2. This all sounds very unconventional to me, especially filtering by GC content, as this might eliminate regions with high GC content, but again, not exactly my field.

ADD REPLY
0
Entering edit mode

I know but i dont find another solution. Well it's my best idea. So, i going to be less aggressive with the GC content of the reads that i'll take out. After that, do a mapping against to B2 assembly will say me the % of B2 reads remain in B3 fastq files, if the percentage is below 1% i'll do the downstream analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6