Hi! I would really appreciate any help. My .sam file, obtained after mapping with minimap2, looks like this:
@HD VN:1.5 SO:coordinate pb:3.0.1
@SQ SN:sgr01 LN:98476147
@SQ SN:sgr02 LN:88342039
@SQ SN:sgr03 LN:76488725
@SQ SN:sgr04 LN:75030901
@SQ SN:sgr05 LN:70255127
(....)
And the test .sam file looks like this:
sample1_HQ_transcript/48 0 chr1 12230030 60 7M1D46M2I37M4070N169M1D3M8149N78M1655N75M1I116M100N81M4605N117M4382N105M2506N96M1I75M483N101M847N90M1I79M2589N102M153N202M751N68M1I1 M1159N134M2178N156M571N211M959N75M3867N207M1974N90M3307N141M2338N123M5955N1I9M1I94M1013N153M2322N91M1I45M1095N48M2170N151M190N211M4599N248M1677N192M307N127M1934N124M1I19M4118N131M221N176M5220N124M875N236M 4047N109M4569N36M1D127M896N109M3379N180M2104N114M811N150M13846N150M1277N97M2120N149M11113N135M1438N168M39170N133M4070N196M37103N58M1I74M9221N241M1799N548M * 0 0 CTGAGCGCGCGGGCCTGCGCCATTGAGGAGCGGCGGGGAGGAAACGCCGCGCAGCGCGCCGGGCTGGGGCGGGCGGCCCGGGACACCGACAGATTTTTCTGTGACCATGAAAGAGAGAAATAAAGAATGATCCATGATTTCTAAACACCTTTTCCTGAGGATATAGTCATGTTGGAAGGCCTTGTAGCCTGGGTTCTCAATACCTATTTGGGAAAATATGTCAATAACTGTGGAGAATATTGAATTAAAAATTCAAGATGTCC (....)
The command, that I try to use, works normally with the test file and doesn't work with mine.
Is it somehow possible to make my file have the same formatting as the test one?
You don't have any
@HQ
line(s) at top of your SAM file header?I have @HD, sorry, it removed for some reason, but I corrected the post.
Then you need to tell us what command and the actual syntax you used.
It was the Cupcake https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake:-supporting-scripts-for-Iso-Seq-after-clustering-step
I used the command, recommended by the tutorial:
It works with the test file, but if I try to use my files instead of it, I get an error:
(or, as a text:
There are spaces in the CIGAR string which generally should be one long string. Was that a side effect of you copying and pasting it here? I assume you have truncated the rest of the sequence/Q scores. If that is just one line that is causing an issue what happens if you edit it out?
The CIGAR string is not supposed to be that long. I've seen that happen though when the command to make the sam file, which is supposed to tell the software what fastq was used in the alignment, is given the wrong fastq.
Since
minimap2
is being used it is possible that input reads are unusually long (looks like Iso-seq cDNA sequencing). Which could potentially lead to long CIGAR like that. I don't have personal experience so could be wrong.Thanks to everyone! I have found how to fix the problem, the issue was indeed with the mapping.