I have the sequenced data of an organism. but it has three 16srRNA which belong to 3 different organisms. I guess it could be contaminated. How could I extract the contigs belonging to each organism present in the sequence data?
I have the sequenced data of an organism. but it has three 16srRNA which belong to 3 different organisms. I guess it could be contaminated. How could I extract the contigs belonging to each organism present in the sequence data?
If you truly feel that there are three organisms then you can use bbsplit.sh
(from BBMap suite) to bin your reads into respective organismal pools. This will generally work well as long as the bacterial are distinct enough. You are able to decide what you want to do with reads that multi-map (map to all three reference genomes). e.g. keep in all bins, toss etc.
Use the answer here and ask if you have any questions: A: Tool to separate human and mouse ran seq reads
Since you have bacterial data you could turn off maxindel=0
.
Yes thank you , I could seperate reads mapping to each reference genome. But still having one doubt. When I use bowtie or Bowtie2, the paired end reads of my data is not getting mapped to the reference genome, even when the 16s sequence of the reference genome is present in my data. Why could that happen?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello,
bbduk might help you. From the web manual:
In the
ref
parameter you can define more than one reference. Have a look atbbduk.sh --help
for more options.fin swimmer
Thanks a lot. Let me check and let you know.
Hi Finswimmer,
I observed that increasing the k-mer value decreases the number of matched reads. What should be the ideal k-mer size for paired reads of length 150bp. How will the interpretation of results related to matched reads will change with changing k-mer size?
Because
k=
value is used to find the initial match if you set it too high then BBMap tools are not going to find any (or find less initial) matches. So no surprise there. Generally settingk=
to something between 20-30 is fine for most applications. Smaller values require more memory.Thank you, this too worked for me.