Why do these two reads cause segmentation fault in Cufflinks?
1
1
Entering edit mode
7.5 years ago

I have 50bp PE RNA-seq, and 4 of my ~200 samples end in error after running Cufflinks (v.2.2.1). Here is my pipeline prior to running Cufflinks:

  1. STAR 2-pass (GRCm38)
  2. Added read group information with Picard
  3. Marked (but did not remove) duplicates with Picard
  4. SplitNCigarReads with GATK
  5. Realigned around indels with GATK
  6. Performed base recalibration with GATK

I've been looking at one particular sample with the error and have narrowed it down to a couple reads (see below):

PC168224:108:C90EBANXX:5:1303:19181:37569   179 6   85461128    60  43M345H =   85461508    338 CCCTCAGATCATCATCCGAGCTTTCCGCACAGCCACCCAATTG C46FE@3CDECDBDD>/C6FE?FD@:EA?5CDBB>33<::;<B MC:Z:380H4I4S   BD:Z:KOOOOMMLMMLMMLLMMLNNJFLLMNNNMOPPQSSQVSSKMMM    PG:Z:MarkDuplicates RG:Z:HL126  NH:i:1  BI:Z:JNOOSQOQNOOMNONJKKOMIBKNJNMOOPQLOPQNUSSNNMM    HI:i:1  nM:i:1  MQ:i:70 AS:i:97 XS:A:+

PC168224:108:C90EBANXX:5:1303:19181:37569   179 6   85461508    70  380H4I4S    =   85461128    -338    GGTGGCTG    AC?9=5EB    MC:Z:43M345H    OC:Z:380H4M4S   BD:Z:ONNOPPMM   PG:Z:MarkDuplicates RG:Z:HL126  NH:i:1  BI:Z:NORNPPMM   HI:i:1  nM:i:1  MQ:i:60 AS:i:97 XS:A:+

Now if I put those to reads into a separate BAM file (with original header from the sample) and running cufflinks I get the following:

cufflinks -o test. test.bam
You are using Cufflinks v2.2.1, which is the most recent release.
[14:04:40] Inspecting reads and determining fragment length distribution.
Segmentation fault

I've look at those two reads but am not sure why about them causes a segmentation fault. Any ideas?

RNA-Seq • 1.9k views
ADD COMMENT
3
Entering edit mode
7.5 years ago

this is a bad cigar string:

380H4I4S

there is not "Match/M" in this cigar string: 380 hard clip, 4 insertions, 4 soft clip. Just 3 things that are not associated to a reference;

ADD COMMENT
0
Entering edit mode

Ah, that seems to be it. Replacing the cigar string with the following fixes the issue:

380H4I4M

Now to figure out why there are bad cigar strings in some of my samples...

ADD REPLY
0
Entering edit mode

i Think it's still wrong anyway because an alignment cannot start with an insertion.

ADD REPLY
0
Entering edit mode

You're probably right. I was just testing it out to see if what you said would fix the issue. I' am going to see how systematic an issue these bad cigar strings are in my files, and if it very few reads then I'll just remove them.

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6