Hi.
I would like to understand the output screen produced by a "samtools tview". I may not googling with good keywords, but I just can't find any document explaining the meaining of ".", "," underlined characters, etc.
Thanks in advance.
Hi.
I would like to understand the output screen produced by a "samtools tview". I may not googling with good keywords, but I just can't find any document explaining the meaining of ".", "," underlined characters, etc.
Thanks in advance.
See the C in samtools /bam_tview.c
c = bam1_strand(p->b)? ',' : '.'
.
is for a matching base reverse strand,
is for a matching base forward strandif (((p->b->core.flag&BAM*FPAIRED) && !(p->b->core.flag&BAM*FPROPER*PAIR)) || (p->b->core.flag & BAM*FSECONDARY)) attr |= A_UNDERLINE;
I disagree with @Pierre.
,
: negative strand,
.
: positive strand.
You could try to separate the reads by its strand using the bitwise flag.
samtools -f 0x10 -b aln.bam> aln.neg.bam
would give you all the reads mapped to negative strand,
samtools -F 0x10 -b aln.bam>aln.pos.bam
would give the positive reads.
Then you could use samtools tview
to see them. Due to alternative splicing, the RNA-seq reads would normally have 123N
, for example, in their cigar string, which means skipped reads, a.k.a splicing junction spanning reads.
You would see the following for positive strand:
CATCACTGGTTTAAAGACAAACTTGCATTGTGAGATTCCAAAATAACAACAACAAAAAACAATTTGCATTGAGAACATTTTGAAG
.........A.......
.....>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
.....>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
You would see for negative strand.
TTTCATTTGCAAGTAATCGATTTAGGTTTTTGATTTTAGGGTTTTTTTTTGTTTTGAACAGTCCAGTCAAAGTACAAATCGAGAG
...KK....KKK..KK.K.K...K........K....K..................KKKK.........K...K...........
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<,,,,,,,,t,,,c,,,,,,,,,,
The strand info comes from the aligner. Aligner get input from genome.fasta file to build index, meanwhile the genome.fasta file is recorded in 5'->3' direction. Thus if the read matches the raw genome.fasta sequence, the read is considered mapped on the positive strand, and vice versa. I hope so.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Oh it's clear now :->
Thank you Pierre!