Entering edit mode
3.3 years ago
eimanpharmacist
▴
20
I just received Pacbio sequences of 16S gene from the core facility. After download, I got 5 files for each sample with following suffix:
.ccs.bam
longest.bam
scraps.bam
subreads.bam
whitelist
From readings, I believe that I should start with ccs.bam files (demultiplexed aligned sequences) and convert them into fastq files for downstream analysis in R using DADA2 package, correct?
I installed bam2fastq on my Linux machine, and I am writing this post asking for support including links and materials about the protocol of what to do next.
Thanks!
I strongly recommend you to read PacBio file formats or asking your core facility. In order to get help for further analysis I suggest you to include details about what have you tried so far, what do you have, and what do you want to obtain from your data. Such analysis are not a trivial task.
I would use samtools to convert the ccs.bam to fastq:
$ samtools fastq reads.ccs.bam >reads.ccs.fastq
and then feed that fastq file into DADA2 like the example here
I have another question, I am converting ccs.bam files into fastq files one by one, and I am wondering if there is a way to do so in batch since I have lots of sequences.
There is the mistake. Believes are for the church, not for analysis. Joke aside, ask the core facility what the files are how how exactly these were generated. At best they have someone with experience in the analysis that you might catch for a zoom call to explain how to get started. Towards the question on how to do that in batch, google
loops
in bash, e.g.for
loop and spend some quality time on Unix basics.