Entering edit mode
2.6 years ago
Maxine
▴
50
I use the minimap2 to do alignment for my pacbio long-reads DNA data against reference genome. Here is the command I use below:
minimap2 -t 30 -2 -I 100g -ax map-pb genome.fa sample_name.fq > aln.sample_name.sam
to align sample_name.fa to reference genome.fa. And use samtools read the sam file with
samtools view aln.sample_name.sam | head
Here is the content of the first line:
m64291e_211224_041609/0/42782_42995 16 chr6 116495378 18 4S18M1D4M1D8M1I3M2I26M1D54M5I27M2I14M2I3M2I6M1I8M1I3M3I10M6S * 0 0 GTGCGCCTTTTTTATGGTTGGGTTATAGAGGGAGCCTATGAGATTTCTTGGTATTTTAGACTATTGAGTTGCCTTCCACCCTGCAGATCCCTCGCACTCCCCCGTACTGGATGTAACCCCGTGGTAGGACATGATTTTGCCTCCACCAAACTTCTCACTGTTTTCCTGGTGTATTAATCTCAGCCTCCATGGCGTCCGGTTCCAGTATGGTTT !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NM:i:32 ms:i:226 AS:i:216 nn:i:0 tp:A:P cm:i:5 s1:i:52 s2:i:0 de:f:0.1122 rl:i:0
It seems that there is no header at all.
the raw data sample_name.fq is convert from raw bam file by samtools bam2fq, I wonder maybe some mistakes were taken in the fq in this step?
Has anyone had a similar situation to mine?
Wow, It works! Thank you! I'm such an idiot.
Here's one more thing I noticed, the headers dominate 749 lines! Is that normal? Here are line 743 to 749 below:
No problem!
It's hard to know exactly what to expect to see in your header without knowing the entire workflow, but it looks like you aligned against reference with many contigs. Each of those contigs will have an
@SQ
line in the header. I'm guessing that's the majority of these 749 header lines.yes! the reference genome contains 749 contigs. Thank you so much!