Question

Combined bacteria and archaea amplicon libraries in one miseq run

0

Entering edit mode

8.6 years ago

dpitta • 0

I'm not able to split libraries from a miseq run where bacteria and archaea libraries were sequenced together. The sequence output is very low. Can anyone suggest if there is a fix for split libraries when amplicon libraries were combined and sequenced. I have used different barcodes

sequencing • 1.8k views

ADD COMMENT • link 8.6 years ago by dpitta • 0

0

Entering edit mode

Input file paths

Mapping filepath: mastermapping.txt (md5: b630a7cb1c5fc1e83b75e7376b467dc5)
Sequence read filepath: qiime_joined_new/fastqjoin.join.fastq (md5: d27601bef25a468e02846681f8c680a9)
Barcode read filepath: qiime_joined_new/fastqjoin.join_barcodes.fastq (md5: ded0228f420b3f88b5e963b2339c75f9)

Quality filter results
Total number of input sequences: 13861241
Barcode not in mapping file: 3954873
Read too short after quality truncation: 437474
Count of N characters exceeds limit: 9404492
Illumina quality digit = 0: 0
Barcode errors exceed max: 0

Result summary (after quality filtering)
Median sequence length: 255.00
SA045   1320
SA047   1153
SA046   1065
SA025   1040
SA029   1039
SA019   966
SA017   963
SA035   960
SA027   957
SA034   930
SA009   930
SA033   929
SA039   906
SA006   897
SA018   892
SA026   889
SA016   876
SA048   869
SA071   867
SA008   858
SA058   853
SA012   853
SA022   851
SA005   848
SA030   844
SA038   839
SA059   824
SA069   821
SA031   814
SA013   800
SA007   795
SA004   791
SA042   783
SA037   783
SA002   777
SA011   776
SA014   767
SA001   761
SA086   760
SA060   755
SA023   754
SA028   743
SA036   733
SA021   729
SA053   728
SA010   725
SA070   722
SA044   718
SA068   717
SA003   706
SA090   690
SA074   680
SA043   673
SA065   668
SA056   650
SA020   650
SA041   647
SA055   640
SA057   637
SA015   630
SA062   618
SA024   616
SA067   611
SA051   603
SA061   601
SA072   600
SA063   590
SA066   571
SA087   566
SA040   566
SA078   543
SA064   530
SA094   529
SA093   528
SA054   513
SA075   512
SA085   489
SA073   446
SA084   440
SA077   440
SA052   428
SA091   410
SA088   405
SA081   394
SA095   385
SA076   370
SA089   341
SA096   311
SA082   301
SA092   267
SA083   263
SA098   147
SA114   30
SA115   22
SA116   21
SA109   20
SA101   19
SA113   18
SA104   16
SA110   13
SA107   12
SA102   9
SA079   8
SA111   7
SA105   7
SA112   6
SA103   6
SA108   5
SA106   5
SA100   1
SA099   1
SA032   1
SA118   0
SA117   0
SA097   0
SA080   0

Total number seqs written   64402
-

Most of the sequences were assigned to count of N. Default was set at 0.

ADD REPLY • link updated 8.6 years ago by GenoMax 153k • written 8.6 years ago by dpitta • 0

0

Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

Two relevant lines of interest in your QIIME log.

Barcode not in mapping file: 3954873

There are some reads that have barcodes that are not expected. This does happen and may be acceptable. It is second line that is of more concern.

Count of N characters exceeds limit: 9404492

I am not a QIIME user but I assume that refers to actual reads having N's in them (can you look/post a few reads here). This amounts to almost 70% of your data. That can be due to over-clustering of low-nucleotide diversity sequences, where the basecalling software has difficulty calling bases (where adding neutral genomic DNA becomes essential).

I assume you received this data from a sequence provider (can you ask them about genomic DNA %, cluster # and PF%). If there are many N's (software is unable to call a definite base) in the actual reads (or barcodes) this run would have to be repeated. Your sequence provider should not have given you this sequence if there were many N's (unless it was done for diagnostic reasons). Depending on your relationship with the provider it would be best to go back and have a discussion to see what happened and what they may be willing to do. If you had not provided clear information about low nucleotide diversity, then be ready to pay for a re-run.

ADD REPLY • link 8.6 years ago by GenoMax 153k

score 1 · Answer 1 · 2016-12-22

What kind of libraries are mixed in a pool generally does not matter (except when they are expected to be low nucleotide diversity samples, edit: Which would be the case here since these are amplicons) as long as you have unique barcodes for each sample.

If the sequence output is low (but the samples demultiplexed fine) then that would mean loading more library on a new run may be warranted^^. If you have enough reads but they are not getting demultiplexed (i.e. they end up in "undetermined" reads pool) then that will require a different approach.

Can you clarify which of these two cases we are working with here?

^^ Do you know if a neutral genomic DNA was spiked in (e.g. phiX) on this run? If so at what %? What was the cluster density (K/mm^2) and PF%?