Quality Analysis Of Pacbio Data
2
5
Entering edit mode
12.7 years ago
Abhi ★ 1.6k

Hey Guys

Just wondering if you guys have come across or developed any process to check for contamination in PacBio reads. By contamination what I mean is to align/blast the PacBio reads to some reference like (nt, 16s etc) to see if proportion of reads hit something not expected pointing to a possible contamination.

Since there is inherent indel errors in the PacBio data, standard blast penalizes the alignment (due to lot of possible gap introductions) which could result in reads not aligning at all.

It would be nice to know what you guys are doing with the contamination check on Pacbio long reads.

Best, -Abhi

next-gen blast sequencing • 5.4k views
ADD COMMENT
0
Entering edit mode

Hi, Every once in a while I ran into contamination problems. I created a blasr index of almost all prokaryotes, and just aligned to that (using blasr). The only caveat is I never updated blasr to allow for larger than 32 bit indexing, and so the largest database you can use is 4G. NCBI's set of prokaryotes is > 4G, and so I pruned it down by removing bacteria from the same strain. If you are looking for human contamination, you can just align to human in one go.

-mark

ADD REPLY
4
Entering edit mode
12.2 years ago
Irsan ★ 7.8k

If you suspect your dataset to contain dna from multiple precies you might also want to plot the gc distribution of your reads. For example, reads sampled from a human genome have a GC-content that is normally distributed with mean 50% (and constant variance). reads sampled from most bacteria have a mean of 65 so then a 2nd peak occurs in your GC-content distribution plot. You can try FastQC on your data and it will product the GC-content distributie plot for you :-)

ADD COMMENT
1
Entering edit mode
12.7 years ago
Bioinfosm ▴ 620

This could get you started http://oelemento.wordpress.com/2011/01/03/a-closer-look-at-the-first-pacbio-sequence-dataset/ It seems like blast would help in a way, despite the small indels errors; and if they are consensus reads, the indel noise is much lesser

ADD COMMENT

Login before adding your answer.

Traffic: 1551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6