How can I split the soft clipped reads and map the splitted reads again.
1
0
Entering edit mode
6.6 years ago

I have a question regarding unmapped reads. From SRBreak paper: "If reads are aligned across breakpoints then some parts of them cannot be mapped the first time. These parts are denoted by the ā€˜Sā€™ character in the CIGAR strings of these reads". 'S' shows Soft Clipping; the clipped nucleotides are present in the read. I can find the number of 'S' character in Cigar. Does anybody know how can I use split reads and align them to a reference genome again?

soft clipped reads. split reads unmapped reads • 4.2k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
1
Entering edit mode

I interpreted this as a slightly different question as the other question didn't cover the additional steps required to turn the soft clipped reads + alignments into a split read. You need to:

  • match the fragments back to their reads

  • drop unmapped fragments - these reads stay as soft clipped reads

  • rehydrate the sequence and quality scores of the originating read (or write a hard clip)

  • replace all the SAM flags, fields and tags with that of original soft clipped read except the alignment-specific ones such as RNAME, POS, CIGAR, and NM tag

  • set supplementary flag

  • write SA tags

  • merge the new supplementary reads back into the input file in their mapped position (they were extracted according to the position of the primary soft clipped alignment)

ADD REPLY
3
Entering edit mode
6.6 years ago
d-cameron ★ 2.9k

I have written a tool to do exact this. gridss.SoftClipsToSplitReads extracts the clipped bases and repeatedly realigns them to the reference with the aligner of your choice (default of bwa). The latest development version (currently undergoing internal testing) also support realignment of existing split reads (e.g. if you don't like bwa SA split read alignment) as well as the entire read (which I use as a validation that my assembly contigs actually originate from where I expect them to).

ADD COMMENT
0
Entering edit mode

Thanks. Actually, I read your paper and your GitHub repository. I have bam file, reference fasta file. I want to realign the soft clipped bases of my bam file with BAW aligner. I think that I should use SoftClipsToSplitReads. Unfortunately, I don't know how should I do with your program. Could you please give me a straight way to do that?

ADD REPLY
0
Entering edit mode

The simplest command-line looks like the following:

java -Xmx512M -cp gridss-VERSION-with-dependencies.jar gridss.SoftClipsToSplitReads I=your_input.bam O=your_output.bam REFERENCE_SEQUENCE=your_reference.fa

ADD REPLY

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6