Question

Why do these two reads cause segmentation fault in Cufflinks?

1

Entering edit mode

8.2 years ago

murphy.charlesj ▴ 100

I have 50bp PE RNA-seq, and 4 of my ~200 samples end in error after running Cufflinks (v.2.2.1). Here is my pipeline prior to running Cufflinks:

STAR 2-pass (GRCm38)
Added read group information with Picard
Marked (but did not remove) duplicates with Picard
SplitNCigarReads with GATK
Realigned around indels with GATK
Performed base recalibration with GATK

I've been looking at one particular sample with the error and have narrowed it down to a couple reads (see below):

PC168224:108:C90EBANXX:5:1303:19181:37569   179 6   85461128    60  43M345H =   85461508    338 CCCTCAGATCATCATCCGAGCTTTCCGCACAGCCACCCAATTG C46FE@3CDECDBDD>/C6FE?FD@:EA?5CDBB>33<::;<B MC:Z:380H4I4S   BD:Z:KOOOOMMLMMLMMLLMMLNNJFLLMNNNMOPPQSSQVSSKMMM    PG:Z:MarkDuplicates RG:Z:HL126  NH:i:1  BI:Z:JNOOSQOQNOOMNONJKKOMIBKNJNMOOPQLOPQNUSSNNMM    HI:i:1  nM:i:1  MQ:i:70 AS:i:97 XS:A:+

PC168224:108:C90EBANXX:5:1303:19181:37569   179 6   85461508    70  380H4I4S    =   85461128    -338    GGTGGCTG    AC?9=5EB    MC:Z:43M345H    OC:Z:380H4M4S   BD:Z:ONNOPPMM   PG:Z:MarkDuplicates RG:Z:HL126  NH:i:1  BI:Z:NORNPPMM   HI:i:1  nM:i:1  MQ:i:60 AS:i:97 XS:A:+

Now if I put those to reads into a separate BAM file (with original header from the sample) and running cufflinks I get the following:

cufflinks -o test. test.bam
You are using Cufflinks v2.2.1, which is the most recent release.
[14:04:40] Inspecting reads and determining fragment length distribution.
Segmentation fault

I've look at those two reads but am not sure why about them causes a segmentation fault. Any ideas?

RNA-Seq • 2.1k views

ADD COMMENT • link updated 8.2 years ago by Pierre Lindenbaum 166k • written 8.2 years ago by murphy.charlesj ▴ 100

score 3 · Accepted Answer · 2017-05-17

3

Entering edit mode

8.2 years ago

Pierre Lindenbaum 166k

this is a bad cigar string:

380H4I4S

there is not "Match/M" in this cigar string: 380 hard clip, 4 insertions, 4 soft clip. Just 3 things that are not associated to a reference;

ADD COMMENT • link 8.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Ah, that seems to be it. Replacing the cigar string with the following fixes the issue:

380H4I4M

Now to figure out why there are bad cigar strings in some of my samples...

ADD REPLY • link 8.2 years ago by murphy.charlesj ▴ 100

0

Entering edit mode

i Think it's still wrong anyway because an alignment cannot start with an insertion.

ADD REPLY • link 8.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

You're probably right. I was just testing it out to see if what you said would fix the issue. I' am going to see how systematic an issue these bad cigar strings are in my files, and if it very few reads then I'll just remove them.

ADD REPLY • link 8.2 years ago by murphy.charlesj ▴ 100