Splitting SAM/BAM alignments
1
2
Entering edit mode
8.0 years ago

Greetings,

I'm aligning large contigs to a reference genome. I'm hitting upper limit on the number of CIGAR operations (error below). Does anyone have a tool that will cut alignments intelligently? You'd have to add hard clipping to the sequence for the query position to be correct.

[E::sam_parse1] too many CIGAR operations
[W::sam_read1] parse error at line 201
htslib sam bam cigar • 2.6k views
ADD COMMENT
0
Entering edit mode

as far as I know , the java htslib uses a 'int' rather than a 'C' unsigned short. It could be easy to code, but I don't really understand your needs.

ADD REPLY
0
Entering edit mode

Pierre Lindenbaum basically I just need to parse the CIGARS to output deletions, but there are over 65K cigar operations. HTSLIB chokes. So option 1. Use another codebase. option 2: cut up alignments in the sam file.

ADD REPLY
2
Entering edit mode

Everything meant for BAM files will choke, since you can't make a BAM out of that. I fear you'll need to code a custom parser.

ADD REPLY
1
Entering edit mode

Devon Ryan Yup, This is going to be more and more of a common occurrence with assemblies being aligned to genomes.

ADD REPLY
1
Entering edit mode

Yeah, at first I was wondering what sort of crazy stuff you were doing today, but then I thought about long pacbio reads making really really long contigs and...I think you raise a good point.

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode
8.0 years ago

I just need to parse the CIGARS to output deletions, but there are over 65K cigar operations.

Using bioalcidae: https://github.com/lindenb/jvarkit/wiki/BioAlcidae (hoping cigar from SAM are parsed using a int ?) and the following script:

while( iter.hasNext() )
    {
    var rec = iter.next();
    if( rec.getReadUnmappedFlag() ) continue;
    var cigar = rec.getCigar();
    var pos = rec.getAlignmentStart();
    for(var  i=0;i< cigar.numCigarElements();++i) {
        var ce = cigar.getCigarElement(i);
        var op = ce.getOperator();
        ifop.name()=="D") {
            out.println("D\t"+rec.getContig()+"\t"+pos+"\t"+ce.getLength());
            }
        if(op.consumesReferenceBases())
            {
            pos += ce.getLength();
            }
        }
    }

usage:

cat in.sam | java -jar dist/bioalcidae.jar -f script.js -F SAM

ADD COMMENT

Login before adding your answer.

Traffic: 1778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6