Question

Raw Read count from bam file

0

Entering edit mode

8.6 years ago

Bioinfonext ▴ 470

I mapped pair end reads to CDS transcripts using BWA.

If I do sorting of bam fine by not using -n flag than its worting:

For sorting: /home/yog/software/samtools-1.3.1/samtools sort 216_5W_Ca1.bam -O BAM -o 216_5W_Ca1.sort.bam

For indexing: /home/yog/software/samtools-1.3.1/samtools index 216_5W_Ca1.sort.bam

For read count: /home/yog/software/samtools-1.3.1/samtools idxstats 216_5W_Ca1.sort.bam > readcount[bam_idxstats]

But read count I am getting like this:

Rs025080 1341 239 27

Rs035250 621 0 0

Rs035280 408 0 0

Rs035290 318 0 0

Rs035300 456 87 0

Why there is three count column for each transcripts.

RNA-Seq • 7.9k views

ADD COMMENT • link updated 8.6 years ago by Jorge Amigo 14k • written 8.6 years ago by Bioinfonext ▴ 470

1

Entering edit mode

I quick look into the manual gives you: "Retrieve and print stats in the index file. The output is TAB-delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads".

ADD REPLY • link 8.6 years ago by ATpoint 89k

0

Entering edit mode

How do you have # unmapped reads per chromosome xD

ADD REPLY • link 8.6 years ago by John 13k

1

Entering edit mode

Here are two reasons: Samtools Idxstats

ADD REPLY • link 8.6 years ago by ATpoint 89k

0

Entering edit mode

Yes the 'Read is unmapped but has the chromosome of it's mapped pair" was my first thought, but then the sum of mapped/unmapped per chromosome would be an even number - but for the last entry it is not, so i don't think that's what's going on.

I didn't know about the BWA concatenating contigs though, that's new to me - thanks for the info :) And that could certainly be what's causing it. I also didn't know the SAM spec calls for unmapped pairs to be on the same chromosome - I always thought they came at the end with all the other unmapped reads. Two biologically meaningless but bioinformatically important bits of information today :)

ADD REPLY • link 8.6 years ago by John 13k

0

Entering edit mode

It stands for "number" in many manuals.

ADD REPLY • link 8.6 years ago by Matteo Schiavinato ★ 3.7k

2

Entering edit mode

That's not what John meant, but rather "if a read is unmapped - how do you know it belongs to a certain chromosome" ;-)

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

0

Entering edit mode

BWA for instance assigns both the unmapped flag and a position. In case the read maps somewhere but is not fulfilling the criteria of being reported as mapped (e.g. too many mismatches).

ADD REPLY • link 8.6 years ago by michael.ante ★ 4.0k

0

Entering edit mode

So the read "almost" maps there :p

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

0

Entering edit mode

Here I map raw reads to a CDS transcripts of a draft genome. So there are around 30% reads of a paired-end library do not map to CDS.

But I am not sure how number were given of unmapped reads according to CDS transcripts ID.

ADD REPLY • link 8.6 years ago by Bioinfonext ▴ 470

0

Entering edit mode

Could you reformulate this question in a more explicit fashion? I am not sure if I got the point. I also have a question: when you say:

I map raw reads to a CDS transcripts

Do you mean against a transcriptome, or against a genome with a GFF/GTF/GFF3 file that indicates the CDS of the transcripts, and you are considering only those?

ADD REPLY • link 8.6 years ago by Matteo Schiavinato ★ 3.7k

0

Entering edit mode

I took CDS from this database. but its genome is not completely sequenced. it's a draft genome.

http://radish-genome.org/Data_resource/

ADD REPLY • link 8.6 years ago by Bioinfonext ▴ 470

0

Entering edit mode

As far as I know, the notation CDS defines those regions of the exons that are translated, because exons also have UTRs. So you might have some RNASeq reads that belong to UTRs and are therefore not present in your CDS sequences. Could this be the case?

ADD REPLY • link 8.6 years ago by Matteo Schiavinato ★ 3.7k

score 0 · Answer 1 · 2016-12-16

0

Entering edit mode

8.6 years ago

Vitis ★ 2.6k

I think to get read mapping stats you should use "samtools flagstat" or for the newer version of samtools, "samtools stats".

ADD COMMENT • link 8.6 years ago by Vitis ★ 2.6k

score 0 · Answer 2 · 2016-12-16

0

Entering edit mode

8.6 years ago

Jorge Amigo 14k

to get read counts you can use -c option, plus all the other options you may want to use to filter the reads to count, like -F4 to count mapped reads only:

samtools view -c input.bam

ADD COMMENT • link 8.6 years ago by Jorge Amigo 14k