Hello everyone, in my initial experiment, I aligned my cDNA sequence from ONT MinION using the following parameters:
./minimap2 -ax map-ont /home/my_reference_PCR.fasta /home/input_pcr_PI4K.fastq > output_PI4K_aligned.sam
Then I built the bam file. I noticed that the alignment focused only on a specific region of the reference with a 10X coverage. To examine other regions, I extracted the reads of maximum length from the fastq file using the following command:
seqkit seq -m $(seqkit stats -T /home/input_pcr_PI4K.fastq | tail -n1 | cut -f 8) /home/input_pcr_PI4K.fastq -o /home/output_PI4K.fastq
However, when aligning this new file using both the -ax map-ont and -x splice commands (since it's a cDNA sequence), I get 0% mapping according to samtools flagstats. I can't figure out why. Is there something wrong with the extraction or do I need to adjust the alignment parameters further? I hope you can help me. Thanks a lot.
I checked and I don't find any matches on BLAST, I'm changing various parameters continuously but I get the same result. What if I considered doing a de novo assembly? Using Canu since I work with long reads?
Did you check BLASTN vs GenBank (NT) too? It could help to give a little more details about species and samples involved. Which flowcell and basecaller versions were used? Have there been spike-in controls? Have adapters been trimmed? Until now, I was convinced that the sequencers don't simply make up sequences out of thin air. So, yes a de-novo assembly may be something to try.
I checked both and no match. The sequences were already clean from the MinKNOW setting but I provide you with flowcell and model:
Flowcell_id:ALJ911_R9.4.1 basecall_model_version_id: 2021_05_17_dna_r9.4.1_minion_384_d37a2ab9
I know I also think it's a contaminant and the idea of de novo assembly came to mind because I don't know what to think. The only thing perhaps is to repeat the PCR.
That is odd but could be a basecalling artifact. If you don't mind could you post the "ghost" sequences or at least a few examples?
Here is the original fastq: https://drive.google.com/file/d/1LO_rw7SvsqQWZPPXQ7S49L-w1vKHKx39/view?usp=drive_link
Here is the fastq extract reads with max lenght: https://drive.google.com/file/d/1QQrxu0vBk_Uf-ac3T1tuKSKm1uejksrC/view?usp=drive_link
I've authorized you to acces