How to remove the contaminant reads from metagenome sequencing
2
0
Entering edit mode
5.7 years ago
jeccy.J ▴ 60

Hi Everyone,

I am using illumina shotgun sequencing data for gut microbiome analysis. I have the negative control (without sample, mainly to check lab and regents contaminations) and positive control (with sample) sequenced data for a large dataset. I used kaiju to generate the taxonomic profile for each sequence reads. Interestingly, I found a few percentages of reads coming from negative control, which i would like to exclude from my final analysis. Can anyone tell me any tools or workflow how to remove all those sequenced reads from my final analysis.

Thanks advanced.

JJ

metagenome • 2.1k views
ADD COMMENT
0
Entering edit mode
5.7 years ago

I think you will see BBSplit interesting for your purpose

ADD COMMENT
0
Entering edit mode

This could be one option. I also have a look on KneadData but the problem with over the 100 genomes. Few fractions of reads which expanded to be present in both the conditions. So i need to check the frequency of each read and remove them.

ADD REPLY
0
Entering edit mode
5.7 years ago
gb ★ 2.2k

Depends on what your final analysis is? If you are working with otu tables you can use https://www.drive5.com/usearch/manual/cmd_otutab_trim.html

ADD COMMENT
0
Entering edit mode

its not an otu table. My final table looks like

> U NB501887:114:HHKJ2AFXY:1:11101:16040:4965:N:0:TACCTGAC+CTCCTTAC#0   0
> U NB501887:114:HHKJ2AFXY:1:11101:23022:4971:N:0:TACCTGAC+CTCCTTAC#0   0
> C NB501887:114:HHKJ2AFXY:1:11101:16939:5271:N:0:TACCTGAC+CTCCTTAC#0   563466
> C NB501887:114:HHKJ2AFXY:1:11101:6457:5652:N:0:TACCTGAC+CTCCTTAC#0    864142
> C NB501887:114:HHKJ2AFXY:1:11101:20880:5787:N:0:TACCTGAC+CTCCTTAC#0   864142
> U NB501887:114:HHKJ2AFXY:1:11101:12636:5880:N:0:TACCTGAC+CTCCTTAC#0   0
> C NB501887:114:HHKJ2AFXY:1:11101:17541:6015:N:0:TACCTGAC+CTCCTTAC#0   864142
> U NB501887:114:HHKJ2AFXY:1:11101:21546:6153:N:0:TACCTGAC+CTCCTTAC#0   0
> C NB501887:114:HHKJ2AFXY:1:11101:26668:6519:N:0:TACCTGAC+CTCCTTAC#0   1403932
> C NB501887:114:HHKJ2AFXY:1:11101:25590:6529:N:0:TACCTGAC+CTCCTTAC#0   36809
> C NB501887:114:HHKJ2AFXY:1:11101:14218:6775:N:0:TACCTGAC+CTCCTTAC#0   724
> C NB501887:114:HHKJ2AFXY:1:11101:17531:7412:N:0:TACCTGAC+CTCCTTAC#0   194
> C NB501887:114:HHKJ2AFXY:1:11101:2501:7596:N:0:TACCTGAC+CTCCTTAC#0    131567
> C NB501887:114:HHKJ2AFXY:1:11101:25521:8037:N:0:TACCTGAC+CTCCTTAC#0   562
> C NB501887:114:HHKJ2AFXY:1:11101:9043:8347:N:0:TACCTGAC+CTCCTTAC#0    1783272
> C NB501887:114:HHKJ2AFXY:1:11101:2895:8555:N:0:TACCTGAC+CTCCTTAC#0    864142
ADD REPLY
0
Entering edit mode

Don't know if there is a good statistical method for this, but

I found a few percentages of reads coming from negative control,

Is it an option to just subtract this percentage from all the counts, you need make a script for it.

ADD REPLY

Login before adding your answer.

Traffic: 2664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6