running segemehl problem
1
0
Entering edit mode
9.8 years ago
biolab ★ 1.4k

Hi everyone,

I am running segemehl to detect circular splicing. Commands as below.

  • Step 1: segemehl.x -d chr1.fa -x chr1.idx
  • Step 2: segemehl.x -d chr1.fa -i chr1.idx -q se.fq --threads 8 -S > mapped.sam
  • Step 3: testrealign.x -d chr1.fa -q mapped.sam -n

Steps 1 and 2 look OK. However, after step3 I found the following errors (the chr1.fa only contains chr1 sequence, no chr 0 as indicated in the error).

Errors are as follows

[SEGEMEHL] Mon Feb  9 21:29:30 2015: reading database sequences.
[SEGEMEHL] Mon Feb  9 21:29:37 2015: 1 database sequences found.
[SEGEMEHL] Mon Feb  9 21:29:37 2015: reading query files.
cluster not found (range check) [55521862,55521862] looking for 55523004 chr 0
cluster not found (range check) [113253473,113253473] looking for 113253082 chr 0
cluster not found (range check) [178269253,178269253] looking for 178359235 chr 0
cluster not found (range check) [55523830,55523830] looking for 55524172 chr 0

The chr1.fa file is shown below.

>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
......

The FASTQ file se.fq is shown below.

@simulate:1/1 length=80
TCTGCTATCGCCCGCATTGGCCACGTACACCTTGCCCAGCGGGTAGATCACAACCAGTGCACAGCAGCCCCCCTCCACTT
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@simulate:2/1 length=80
GTCGGCCTATGAGCCTAATGACCACAGCAGGCTCGGGTGATGGCCACCATTGGGGTGACCCGAGGCTTGGGAGACCACAG
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The file mapped.sam is shown below.

simulate:116870/1 length=80     0       chr1    27709683        255     80M     *       0       0       CCGCCACAGAGCGCAGTCCGCAAGGCGCGGGGTCCCGGAGCCTTCATGGGCTCGCGGGGTGGGGGTGGCCGGGGGCGGTG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!        NM:i:1  MD:Z:43C36      NH:i:1  XI:i:0      XA:Z:Q
simulate:1/1 length=80  0       chr1    113254966       255     80M     *       0       0       TCTGCTATCGCCCGCATTGGCCACGTACACCTTGCCCAGCGGGTAGATCACAACCAGTGCACAGCAGCCCCCCTCCACTT    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!        NM:i:4  MD:Z:0C11T23T3A39       NH:i:1  XI:i:0      XA:Z:Q
simulate:116871/1 length=80     0       chr1    27708331        255     80M     *       0       0       TCCATAGCAAAGCAGCCTTCCCACCCCCTGACTCCCATCAAGACCCCAGCCCCCACTCACCGCTAAATTTCCGATTCAGA    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!        NM:i:1  MD:Z:67C12      NH:i:1  XI:i:0      XA:Z:Q
simulate:818054/1 length=80     16      chr1    12318062        255     80M     *       0       0       ACCTGGGACTTTATGTTGCACCGCGCTCGTGATGCTGTATCTTACATTGACAAATATTTCAACAAGTTAACAGGAGGCCT    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!        NM:i:2  MD:Z:46C23A9    NH:i:1  XI:i:0      XA:Z:Q

Could anyone help? I much appreciated your suggestions. THANKS!

segemehl • 3.1k views
ADD COMMENT
1
Entering edit mode
9.8 years ago

You have to sort the sam file, before giving it to testrealign.

Try the following:

./segemehl.x -d chr1.fa -x chr1.idx
./segemehl.x -d chr1.fa -i chr1.idx -q reads.fastq -S | samtools view -bS - | samtools sort -o - deleteme | samtools view -h - > mapped.sam
./testrealign.x -d hg19.fa -q mapped.sam -n
ADD COMMENT
0
Entering edit mode

Hi David,

Thank you very much for your answer. I have successfully finished the running. I have one further question: I see two new files named splicesites.bed and transrealigned.bed. Their format are as follows.

transrealigned.bed

chr1  165884  165884  distsplice:chr1:703993:1:6:3:L:F   0   +

splicesites.bed

chr1   12057   12179   splits:3:3:3:N:P   0   +

How to determine the circular splicing using these two files? THANKS!

ADD REPLY
1
Entering edit mode

If you don't understand the output format, you can find more information in the publication

Hoffmann S, Otto C, Doose G, Tanzer A, Langenberger D, Christ S, Kunz M, Holdt L, Teupser D, Hackermüeller J, Stadler PF: 'A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and fusion detection', Genome Biology, 15:R34, doi:10.1186/gb-2014-15-2-r34 (2014)

and in the manual. These sources will also explain you how the algorithm works.

For your circular junctions, just search for 'backsplice' in the manual. It should give you the information needed.

ADD REPLY
0
Entering edit mode

Hi David, thank you for your manual link. Now I understand that in field 4, "C" stands for circular.

ADD REPLY
0
Entering edit mode

Hello everyone,

I am new to segemehl and this explanation helped me a lot. I have tried this for the paired end data reads, but getting a error message after running the second command.

Command used:

./segemehl.x -i ce6.idx -d ce6_eg.fa -q SRR2104398_1.fastq -p SRR2104398_2.fastq -S | samtools view -bS - | samtools sort -o - deleteme | samtools view -h - > map_reads.sam

Result:

[bam_header_read] EOF marker is absent. The input is probably truncated.
[SEGEMEHL] Thu Sep 10 11:51:42 2015: reading queries in 'SRR2104398_1.fastq'.
[bam_header_read] EOF marker is absent. The input is probably truncated.

Kindly suggest, although it is reading the queries further but why indicating that the input is truncated.

If the problem is in command then please suggest the correction because after execution of this command when 3rd is executed empty .bed files are obtained.

ADD REPLY
1
Entering edit mode

This is a bug of an old samtools version. With the latest version of samtools you shouldn't get this error message any more. But, even with this error message, the results should be ok.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6