Question

Different number of sequences on the same file between two tools

0

Entering edit mode

4.4 years ago

pablo ▴ 310

Hello,

I did a demultiplexing analysis on a sequencing PacBio data file. That gives me back 20 BAM files corresponding to 20 bacteria. That analysis gives also a file :

     IdxFirst  IdxCombined  IdxFirstNamed  IdxCombinedNamed  Counts  MeanScore
        6         6            bc1008         bc1008            62939   73
        20        20           bc1023         bc1023            51303   70
        21        21           bc1024         bc1024            62978   69
        22        22           bc1026         bc1026            48417   70
        23        23           bc1027         bc1027            17737   70
        24        24           bc1028         bc1028            34801   71
        25        25           bc1029         bc1029            38043   67
        27        27           bc1031         bc1031            113230  69 
....

For example, for the first bacteria bc1008, it founds 62939 corresponding to 62939 contigs.

Then, I converted the BAM files in FASTA. I used gtseq stat from the genometools library on each file to get more statistics (N50, mean size...). For the first file corresponding to the first bacteria (bc1008), I get :

# number of contigs:     222576
# total contigs length:  2071178900
# mean contig size:      9305.49
# contig size first quartile: 6629
# median contig size:         8811
# contig size third quartile: 11619
# longest contig:             113113
# shortest contig:            51
# contigs > 500 nt:           217752 (97.83 %)
# contigs > 1K nt:            214604 (96.42 %)
# contigs > 10K nt:           84423 (37.93 %)
# contigs > 100K nt:          6 (0.00 %)
# contigs > 1M nt:            0 (0.00 %)
# N50:                   10722
# L50:                   70163
# N80:                   7754
# L80:                   138032

It founds 222576 contigs, what is totally different of the number of contigs found by the demultiplexing analysis. I can't figure out why...

Any suggestion?

sequencing counts genometools pacbio bam • 950 views

ADD COMMENT • link updated 4.3 years ago by Biostar 20 • written 4.4 years ago by pablo ▴ 310

0

Entering edit mode

Did you use PacBio tools, bam2fastx and lima, for this conversion/demultiplexing? If not I would recommend using those specific tools.

ADD REPLY • link 4.4 years ago by GenoMax 147k

0

Entering edit mode

I used the samtools package to do the conversion between the BAM and the FASTA files. I try with bam2fastx and I tell you if it is good.

ADD REPLY • link 4.4 years ago by pablo ▴ 310