Hi,
Recently I got Pacbio Hifi reads generated using CCS mode of a plant whole genome de-novo assembly.
I received 2 file types from the sequencing facility.
Fastq.gz and Bam file
I am getting confused in two places.
From my understanding i learned that Pacbio sequencing output is in bam format by default. But the file that i received i feel it is not raw file but produced by CCS software using this command " ccs movie.subreads.bam movie.ccs.bam ".
I have used hifiasm to generate primary contig using the fastq file. Now i want to align the HiFi reads back to the assemblies and filter contigs showing a read depth close to 0, as well as aligning contigs to plant mitochondrial and chloroplast genome sequences to detect organellar contigs. I am confused which alliner to use . I have came across pbalign and minimap2 for the purpose.
I am new to working with Pacbio data. Please let me know if you have any suggestions.
Note- WGS Hifi data(Not RNA- seq data).
pbmm2 can align either a bam OR a fasta/fastq to a reference genome
Is your CCS bam file suffixed with *.ccs.bam or *.hifi_reads.bam? If the latter, then it is the HiFi subset of CCS reads, if the former, then it is likely to be ALL CCS reads in the dataset, not filtered for Q20 reads.
Working from just the fasta/fastq should be fine for what you're trying to do. There is often additional information in the BAM files necessary for certain analyses (kinetics/basemods) but for your purpose working with the fasta/fastqs should be sufficient.
Because i am using hifi reads to generate the primary assembly.
And now want to map the hifi reads back to the assembly to remove contigs with 0 reads mapped.
I am confused because i am using CCS generated hifi reads. So, what should i do.
You must be using an older version of pbmm2. In recent versions the pbmm2 align --help output shows:
--preset STR Set alignment mode. See below for preset parameter details. Valid choices:
(SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED). [SUBREAD]
HIFI reads are just a subset of CCS reads so the same preset option will work here. The default is SUBREAD. If you don't specify anything, the SUBREAD preset lacks the -u parameter which disables homopolymer compression, meaning that homopolymers will be compressed, which will probably have minimal/subtle impact to your alignments so running without any preset probably won't make much of a difference.
So, the command in pbmm2 is little bit confusing. It has this --present mode with option CCS.
Should i use the CCS mode here or just do normal indexing and alignment.
Or
Because i am using hifi reads to generate the primary assembly.
And now want to map the hifi reads back to the assembly to remove contigs with 0 reads mapped.
I am confused because i am using CCS generated hifi reads. So, what should i do.
You must be using an older version of pbmm2. In recent versions the
pbmm2 align --help
output shows:HIFI reads are just a subset of CCS reads so the same preset option will work here. The default is SUBREAD. If you don't specify anything, the SUBREAD preset lacks the
-u
parameter which disables homopolymer compression, meaning that homopolymers will be compressed, which will probably have minimal/subtle impact to your alignments so running without any preset probably won't make much of a difference.Also, Indexing isn't necessary. Just do
pbmm2 align ...