Question

Bacterial reads in ATAC-seq

0

Entering edit mode

4.0 years ago

C4 ▴ 30

I made single-cell ATAC-seq libraries from human brain, but I am getting upto 70% reads unmapped to human, these unmapped reads seem to have highest hit to bacteria. Can someone please suggest how to reduce bacterial contamination in ATAC-seq? Could these be multi-mapping reads, and how could I remove them from fastq files? Does sequencing to 10x more depth than recommended cause bacteria reads to occur if library complexity is low? Thank you!

sequencing next-gen • 1.5k views

ADD COMMENT • link 4.0 years ago by C4 ▴ 30

0

Entering edit mode

Does sequencing to 10x more depth than recommended cause bacteria reads to occur if library complexity is low?

Short answer is no. Did you get your sequencing done from a provider that offers deep discounts? If so, it is possible that your samples may have been run as a part of larger pool (do your samples have dual or single indexes) and you may have some index hopping going on which resulted in some other sample reads sneaking into your data. Or your samples could have been contaminated somewhere along the way (either by you or by your sequence provider).

You could simply use bbsplit.sh with human genome and keep reads that map discarding the others (see: Tool to separate human and mouse rna seq reads )

BTW: What kind of single cell libraries are these? If these are 10x genomics libraries then there should be no issues like this.

ADD REPLY • link 4.0 years ago by GenoMax 147k

0

Entering edit mode

Thanks for your response. We sequenced them on Illumina NovaSeq and all my libraries were multiplexed and run on one lane. These are 10x genomics dual index scATAC libraries. It is hard to imagine how they may have been contaminated.

ADD REPLY • link 4.0 years ago by C4 ▴ 30

0

Entering edit mode

Were these libraries run following special sequencing recommendation from 10x? You should use cellranger-atac to process this data. Is that what you used?

ADD REPLY • link 4.0 years ago by GenoMax 147k

0

Entering edit mode

Yes they were run using the recommendations from 10x, except we opted for 100bp R1/R2 and not 50bp R1/R2, which from our past analysis should not cause any issues. I also ran them using cellranger-atac. The recommended sequencing depth is 50,000 reads per nuclei, we had about ~4000 nuclei per sample = 200M reads per sample. We ended up sequencing it to ~800M reads per sample, which is 4 times more seq depth. Our mapping to the human genome was thus only 25% - about 200M reads mapped but the remaining had hits to bacteria. Extra sequencing depth was therefore my concern. We expect some bacteria in samples since these library preps were not done in laminar flow, but not to the extent of 50-60% of the total reads, which is strange!

ADD REPLY • link 4.0 years ago by C4 ▴ 30

0

Entering edit mode

Sounds like you did everything (mostly) by the book so if the contamination is real then there is not much you can do with this dataset. Contaminated reagents is a remote possibility. You could check with 10x/Illumina to see if the lot numbers you used have any know issues. Otherwise you may have to cut your losses and start over.