no header in SAM file
1
0
Entering edit mode
2.6 years ago
Maxine ▴ 50

I use the minimap2 to do alignment for my pacbio long-reads DNA data against reference genome. Here is the command I use below:

minimap2 -t 30 -2 -I 100g -ax map-pb genome.fa sample_name.fq > aln.sample_name.sam 

to align sample_name.fa to reference genome.fa. And use samtools read the sam file with

samtools view aln.sample_name.sam | head

Here is the content of the first line:

m64291e_211224_041609/0/42782_42995 16 chr6 116495378 18 4S18M1D4M1D8M1I3M2I26M1D54M5I27M2I14M2I3M2I6M1I8M1I3M3I10M6S * 0 0 GTGCGCCTTTTTTATGGTTGGGTTATAGAGGGAGCCTATGAGATTTCTTGGTATTTTAGACTATTGAGTTGCCTTCCACCCTGCAGATCCCTCGCACTCCCCCGTACTGGATGTAACCCCGTGGTAGGACATGATTTTGCCTCCACCAAACTTCTCACTGTTTTCCTGGTGTATTAATCTCAGCCTCCATGGCGTCCGGTTCCAGTATGGTTT !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NM:i:32 ms:i:226 AS:i:216 nn:i:0 tp:A:P cm:i:5 s1:i:52 s2:i:0 de:f:0.1122 rl:i:0

It seems that there is no header at all.

the raw data sample_name.fq is convert from raw bam file by samtools bam2fq, I wonder maybe some mistakes were taken in the fq in this step?

Has anyone had a similar situation to mine?

minimap samtools pacbio • 3.1k views
ADD COMMENT
3
Entering edit mode
2.6 years ago
Billy Rowell ▴ 330

samtools view ... does not output the header by default. Try samtools view -h ....

ADD COMMENT
0
Entering edit mode

Wow, It works! Thank you! I'm such an idiot.

Here's one more thing I noticed, the headers dominate 749 lines! Is that normal? Here are line 743 to 749 below:

@SQ     SN:original_scaffold_1837_pilon LN:9511
@SQ     SN:original_scaffold_906_pilon  LN:9172
@SQ     SN:original_scaffold_2215_pilon LN:8750
@SQ     SN:original_scaffold_2026_pilon LN:7171
@SQ     SN:original_scaffold_1113_pilon LN:4052
@PG     ID:minimap2     PN:minimap2     VN:2.24-r1122   CL:minimap2 -t 30 -2 -I 5g -ax map-pb /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa /home/maxine91/projects/def-jfu/data/Z03_fq.dir/cy201704.fq
@PG     ID:samtools     PN:samtools     PP:minimap2     VN:1.13 CL:samtools view -h aln.cy201704.sam 
ADD REPLY
1
Entering edit mode

No problem!

It's hard to know exactly what to expect to see in your header without knowing the entire workflow, but it looks like you aligned against reference with many contigs. Each of those contigs will have an @SQ line in the header. I'm guessing that's the majority of these 749 header lines.

ADD REPLY
0
Entering edit mode

yes! the reference genome contains 749 contigs. Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6