A constant problem with SAM file header
1
0
Entering edit mode
6.1 years ago

Hi! I have mapped my RNA-seq reads on reference genome by bowtie2 after trimming Illumina adapters by Trimmoatic as below,

/bowtie2$ bowtie2 -p 4 -x Dicty_primary_genomic -U trialoutput.fq -S trialoutput.sam 


6619872 reads; of these:
  6619872 (100.00%) were unpaired; of these:
    303213 (4.58%) aligned 0 times
    2426695 (36.66%) aligned exactly 1 time
    3889964 (58.76%) aligned >1 times
95.42% overall alignment rate

The header of my SAM file (I have sorted SAM) looks so;

/samtools-1.9$ head trialoutput.sorted.sam
@HD VN:1.0  SO:coordinate
@SQ SN:DDB0216524|DDB_G0267364  LN:2441
@SQ SN:DDB0216528|DDB_G0267372  LN:2644
@SQ SN:DDB0191165|DDB_G0267380  LN:3559
@SQ SN:DDB0216498|DDB_G0267304  LN:4580
@SQ SN:DDB0202373|DDB_G0267338  LN:2486
@SQ SN:DDB0216520|DDB_G0267356  LN:6002
@SQ SN:DDB0307434|DDB_G0269812  LN:2651
@SQ SN:DDB0266571|DDB_G0269818  LN:2362
@SQ SN:DDB0302581|DDB_G0267990  LN:2561

As you are considering, pretty different with bowtie2 tutorial on example data as below;

@HD VN:1.0  SO:unsorted
@SQ SN:gi|9626243|ref|NC_001416.1|  LN:48502
@PG ID:bowtie2  PN:bowtie2  VN:2.0.1
r1  0   gi|9626243|ref|NC_001416.1| 18401   42  122M    *   0   0   TGAATGCGAACTCCGGGACGCTCAGTAATGTGCG  +"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>  AS:i:-5 XN:i:0  XM:i:3  XO:i:0  XG:i:0  NM:i:3  MD:Z:59G13G21G26    YT:Z:UU
r2  0   gi|9626243|ref|NC_001416.1| 8886    42  275M    *   0   0   NTTNTGATGCGGGCTTGTGGAGTTCAG

I just wondered if this means that my SAM file is empty or any thing going wrong with my mapping?

Thanks a lot for any help

software error SAM bowtie2 • 3.3k views
ADD COMMENT
2
Entering edit mode

I'm not seeing the glaring problem. Your reference has multiple sequences, theirs has one. Your have repetitive elements; most of your reads map equally well to multiple places, but if that's your reference, then that's your reference. But your mapping hardly failed.

ADD REPLY
1
Entering edit mode

You should scroll further down in your own file. At some point you should start seeing the reads. head is only showing you top 10 lines.

ADD REPLY
2
Entering edit mode
6.1 years ago
h.mon 35k

I wonder what you consider what you consider "pretty different". The only difference I can see is the bowtie2 tutorial example has a short header section, as the reference consists of only one contig - the reference is the complete genome of Enterobacteria phage lambda. Your reference apparently consists of tens or hundreds of short contigs, so it is quite long and head doesn't show any mapped reads. To see the first few mapped reads:

samtools view trialoutput.sorted.sam | head

This won't show header lines, just the mapping section. See also samtools flagstat and samtools stats to see mapping statistics.

ADD COMMENT

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6