Different number of sequences on the same file between two tools
0
0
Entering edit mode
4.4 years ago
pablo ▴ 310

Hello,

I did a demultiplexing analysis on a sequencing PacBio data file. That gives me back 20 BAM files corresponding to 20 bacteria. That analysis gives also a file :

     IdxFirst  IdxCombined  IdxFirstNamed  IdxCombinedNamed  Counts  MeanScore
        6         6            bc1008         bc1008            62939   73
        20        20           bc1023         bc1023            51303   70
        21        21           bc1024         bc1024            62978   69
        22        22           bc1026         bc1026            48417   70
        23        23           bc1027         bc1027            17737   70
        24        24           bc1028         bc1028            34801   71
        25        25           bc1029         bc1029            38043   67
        27        27           bc1031         bc1031            113230  69 
....

For example, for the first bacteria bc1008, it founds 62939 corresponding to 62939 contigs.

Then, I converted the BAM files in FASTA. I used gtseq stat from the genometools library on each file to get more statistics (N50, mean size...). For the first file corresponding to the first bacteria (bc1008), I get :

# number of contigs:     222576
# total contigs length:  2071178900
# mean contig size:      9305.49
# contig size first quartile: 6629
# median contig size:         8811
# contig size third quartile: 11619
# longest contig:             113113
# shortest contig:            51
# contigs > 500 nt:           217752 (97.83 %)
# contigs > 1K nt:            214604 (96.42 %)
# contigs > 10K nt:           84423 (37.93 %)
# contigs > 100K nt:          6 (0.00 %)
# contigs > 1M nt:            0 (0.00 %)
# N50:                   10722
# L50:                   70163
# N80:                   7754
# L80:                   138032

It founds 222576 contigs, what is totally different of the number of contigs found by the demultiplexing analysis. I can't figure out why...

Any suggestion?

sequencing counts genometools pacbio bam • 949 views
ADD COMMENT
0
Entering edit mode

Did you use PacBio tools, bam2fastx and lima, for this conversion/demultiplexing? If not I would recommend using those specific tools.

ADD REPLY
0
Entering edit mode

I used the samtools package to do the conversion between the BAM and the FASTA files. I try with bam2fastx and I tell you if it is good.

ADD REPLY

Login before adding your answer.

Traffic: 2349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6