Question

How can I split the soft clipped reads and map the splitted reads again.

0

Entering edit mode

6.6 years ago

fatima.m.zare ▴ 30

I have a question regarding unmapped reads. From SRBreak paper: "If reads are aligned across breakpoints then some parts of them cannot be mapped the first time. These parts are denoted by the ‘S’ character in the CIGAR strings of these reads". 'S' shows Soft Clipping; the clipped nucleotides are present in the read. I can find the number of 'S' character in Cigar. Does anybody know how can I use split reads and align them to a reference genome again?

soft clipped reads. split reads unmapped reads • 4.2k views

ADD COMMENT • link updated 6.6 years ago by d-cameron ★ 2.9k • written 6.6 years ago by fatima.m.zare ▴ 30

1

Entering edit mode

duplicate of extracting the soft clipped seq only from a sam file

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

I interpreted this as a slightly different question as the other question didn't cover the additional steps required to turn the soft clipped reads + alignments into a split read. You need to:

match the fragments back to their reads
drop unmapped fragments - these reads stay as soft clipped reads
rehydrate the sequence and quality scores of the originating read (or write a hard clip)
replace all the SAM flags, fields and tags with that of original soft clipped read except the alignment-specific ones such as RNAME, POS, CIGAR, and NM tag
set supplementary flag
write SA tags
merge the new supplementary reads back into the input file in their mapped position (they were extracted according to the position of the primary soft clipped alignment)

ADD REPLY • link 6.6 years ago by d-cameron ★ 2.9k

score 3 · Accepted Answer · 2018-04-11

3

Entering edit mode

6.6 years ago

d-cameron ★ 2.9k

I have written a tool to do exact this. gridss.SoftClipsToSplitReads extracts the clipped bases and repeatedly realigns them to the reference with the aligner of your choice (default of bwa). The latest development version (currently undergoing internal testing) also support realignment of existing split reads (e.g. if you don't like bwa SA split read alignment) as well as the entire read (which I use as a validation that my assembly contigs actually originate from where I expect them to).

ADD COMMENT • link 6.6 years ago by d-cameron ★ 2.9k

0

Entering edit mode

Thanks. Actually, I read your paper and your GitHub repository. I have bam file, reference fasta file. I want to realign the soft clipped bases of my bam file with BAW aligner. I think that I should use SoftClipsToSplitReads. Unfortunately, I don't know how should I do with your program. Could you please give me a straight way to do that?

ADD REPLY • link 6.6 years ago by fatima.m.zare ▴ 30

0

Entering edit mode

The simplest command-line looks like the following:

java -Xmx512M -cp gridss-VERSION-with-dependencies.jar gridss.SoftClipsToSplitReads I=your_input.bam O=your_output.bam REFERENCE_SEQUENCE=your_reference.fa

ADD REPLY • link 6.6 years ago by d-cameron ★ 2.9k