Question

If A Read Is Clipped, What Is The Preferred Way To Make Tag Counts?

0

Entering edit mode

12.5 years ago

KCC ★ 4.1k

I want to write a program that converts SAM files to genome coverage (so wiggle or bedgraph format). So, my question is related to prrocessing the output of the aligner. My program would work a little bit like the genomeCoverageBed function in bedtools

genomeCoverageBed -bg -d -ibam reads.bam -g genome.csv

However, I wouldn't have to do the extra step of translating from SAM to BAM.

Now, it's reasonably straightforward to scan through a SAM file and pick out the strand and location of a tag. The length of the read can be inferred. Of course, one will often know the length of the reads anyway.

My question is how to handle the hard/soft clipping in terms of the length of the tag. Presumably, taking the clipping into account would mean dropping a few bases at the start or the end, thus having a shorter read. This would affect the tag count totals in the output to my function.

In DNA-seq, it seems like it doesn't make much sense to take clipping into account, because the location of the read is what mattered. Any feedback would be appreciated.

genome-coverage sam • 3.5k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 12.5 years ago by KCC ★ 4.1k

1

Entering edit mode

I think of read clipping as something that is done by the aligner. Perhaps you are talking about read trimming (prior to alignment)? Could you clarify?

ADD REPLY • link 12.5 years ago by Sean Davis 27k

score 1 · Answer 1 · 2013-02-25

1

Entering edit mode

12.4 years ago

Istvan Albert 102k

IMO if the read is clipped then the section that was clipped did not cover the genome, so it should not be accounted for in the coverage or in any other manner. I would treat it as if that particular read was shorter.

ADD COMMENT • link 12.4 years ago by Istvan Albert 102k

0

Entering edit mode

I was thinking that at least in DNA-seq, we want to place the fragment. What mechanisms would cause edges of the read not to map? If this mechanism is a corruption of these bases then we could still use the number of bases to figure out how far the edge of the fragment extends. If these bases are bases appended to the edge of the read, then the number of bases is useless information.

ADD REPLY • link 12.4 years ago by KCC ★ 4.1k

1

Entering edit mode

genomic structural variations would be the simplest and most likely explanation.

But even if the cause of clipping were incorrectly called bases or other errors you should not extend them because with that you generate data that later you cannot distinguish from actually measured values.

ADD REPLY • link 12.4 years ago by Istvan Albert 102k