Question

FASTA contaminant contig mapping/pulldown?

0

Entering edit mode

5.4 years ago

ctseto ▴ 310

Curious to see how much got through my pulldown of contaminant fastq in NGS by assessing what comes out in the de novo assembly. Would using something like minimap to map reads against reference be appropriate? I imagine there are better methods than to make sam/bams...

FASTA Assembly • 986 views

ADD COMMENT • link updated 5.4 years ago by Charles Warden 8.3k • written 5.4 years ago by ctseto ▴ 310

0

Entering edit mode

my pulldown of contaminant fastq in NGS

What does this mean? You knew your data was contaminated and it was still assembled with the contaminants in place?

ADD REPLY • link 5.4 years ago by GenoMax 146k

score 0 · Answer 1 · 2019-06-04

I'm not 100% sure if I understand your question:

1) If you are trying to filter a large fraction of unintended sequence (such as a sample with both pathogen + host sequence), a joint alignment may be useful in order to filter reads aligned to the host. If you only use the host sequence, there may be reads that can be aligned to both sequences, but are more similar to the pathogen.

In this situation, I understand use of FASTA, as a reference for alignment of Illumina FASTQ reads.

2) If you are asking if there can be unexpected sequences, that is "Yes," but they should hopefully make up a small fraction of your sample (and may be less important to filter for your de-novo assembly). Cross-contamination from projects with different species would be fairly clear (and possibly detected with FastQ Screen?), but otherwise checking PhiX sequence in your FASTA / FASTQ reads is something you can look for (even though there theoretically shouldn't be any such sequence in your de-multiplexed samples, unless it was intentionally added for that specific library):

C: Calling Single-Barcode Samples from Mixed Runs as Dual-Barcode Samples | Possibl

However, in that situation, I am confused why you are asking about FASTA (versus FASTQ) sequences.