Question

Does my .sam file look ok? How to convert it?

0

Entering edit mode

4.6 years ago

tanya_fiskur ▴ 70

Hi! I would really appreciate any help. My .sam file, obtained after mapping with minimap2, looks like this:

@HD     VN:1.5  SO:coordinate   pb:3.0.1
@SQ     SN:sgr01        LN:98476147
@SQ     SN:sgr02        LN:88342039
@SQ     SN:sgr03        LN:76488725
@SQ     SN:sgr04        LN:75030901
@SQ     SN:sgr05        LN:70255127
(....)

And the test .sam file looks like this:

sample1_HQ_transcript/48        0       chr1    12230030        60      7M1D46M2I37M4070N169M1D3M8149N78M1655N75M1I116M100N81M4605N117M4382N105M2506N96M1I75M483N101M847N90M1I79M2589N102M153N202M751N68M1I1    M1159N134M2178N156M571N211M959N75M3867N207M1974N90M3307N141M2338N123M5955N1I9M1I94M1013N153M2322N91M1I45M1095N48M2170N151M190N211M4599N248M1677N192M307N127M1934N124M1I19M4118N131M221N176M5220N124M875N236M    4047N109M4569N36M1D127M896N109M3379N180M2104N114M811N150M13846N150M1277N97M2120N149M11113N135M1438N168M39170N133M4070N196M37103N58M1I74M9221N241M1799N548M  *       0       0       CTGAGCGCGCGGGCCTGCGCCATTGAGGAGCGGCGGGGAGGAAACGCCGCGCAGCGCGCCGGGCTGGGGCGGGCGGCCCGGGACACCGACAGATTTTTCTGTGACCATGAAAGAGAGAAATAAAGAATGATCCATGATTTCTAAACACCTTTTCCTGAGGATATAGTCATGTTGGAAGGCCTTGTAGCCTGGGTTCTCAATACCTATTTGGGAAAATATGTCAATAACTGTGGAGAATATTGAATTAAAAATTCAAGATGTCC (....)

The command, that I try to use, works normally with the test file and doesn't work with mine.

Is it somehow possible to make my file have the same formatting as the test one?

rna-seq sam • 1.7k views

ADD COMMENT • link 4.6 years ago by tanya_fiskur ▴ 70

0

Entering edit mode

You don't have any @HQ line(s) at top of your SAM file header?

@HD VN:1.6 SO:coordinate

ADD REPLY • link 4.6 years ago by GenoMax 147k

0

Entering edit mode

I have @HD, sorry, it removed for some reason, but I corrected the post.

ADD REPLY • link 4.6 years ago by tanya_fiskur ▴ 70

0

Entering edit mode

The command, that I try to use, works normally with the test file and doesn't work with mine.

Then you need to tell us what command and the actual syntax you used.

ADD REPLY • link 4.6 years ago by GenoMax 147k

0

Entering edit mode

It was the Cupcake https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake:-supporting-scripts-for-Iso-Seq-after-clustering-step

I used the command, recommended by the tutorial:

collapse_isoforms_by_sam.py --input hq_isoforms.fastq --fq -s hq_isoforms.fastq.sorted.sam --dun-merge-5-shorter -o test

It works with the test file, but if I try to use my files instead of it, I get an error: error

(or, as a text:

Traceback (most recent call last):
File "/home/smrtanalysis/anaconda3/envs/anaCogen/bin/collapse_isoforms_by_sam.py", line 4, in <module>
__import__('pkg_resources').run_script('cupcake==11.0.0', 'collapse_isoforms_by_sam.py')
 File "/home/smrtanalysis/anaconda3/envs/anaCogen/lib/python3.7/site-packages/pkg_resources/__init__.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
  File "/home/smrtanalysis/anaconda3/envs/anaCogen/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1464, in run_script
exec(code, namespace, namespace)
  File "/home/smrtanalysis/anaconda3/envs/anaCogen/lib/python3.7/site-packages/cupcake-11.0.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 235, in <module>
main(args)
  File "/home/smrtanalysis/anaconda3/envs/anaCogen/lib/python3.7/site-packages/cupcake-11.0.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 185, in main
for recs in iter: # recs is {'+': list of list of records, '-': list of list of records}
  File "/home/smrtanalysis/anaconda3/envs/anaCogen/lib/python3.7/site-packages/cupcake-11.0.0-py3.7-linux-x86_64.egg/cupcake/tofu/branch/branch_simple2.py", line 81, in iter_gmap_sam
records = [next(quality_alignments)]
  File "/home/smrtanalysis/anaconda3/envs/anaCogen/lib/python3.7/site-packages/cupcake-11.0.0-py3.7-linux-x86_64.egg/cupcake/tofu/branch/branch_simple2.py", line 113, in get_quality_alignments
elif r.identity < self.min_aln_identity:

TypeError: '<' not supported between instances of 'NoneType' and 'float'

ADD REPLY • link 4.6 years ago by tanya_fiskur ▴ 70

2

Entering edit mode

There are spaces in the CIGAR string which generally should be one long string. Was that a side effect of you copying and pasting it here? I assume you have truncated the rest of the sequence/Q scores. If that is just one line that is causing an issue what happens if you edit it out?

ADD REPLY • link 4.6 years ago by GenoMax 147k

1

Entering edit mode

The CIGAR string is not supposed to be that long. I've seen that happen though when the command to make the sam file, which is supposed to tell the software what fastq was used in the alignment, is given the wrong fastq.

ADD REPLY • link 4.6 years ago by swbarnes2 14k

1

Entering edit mode

Since minimap2 is being used it is possible that input reads are unusually long (looks like Iso-seq cDNA sequencing). Which could potentially lead to long CIGAR like that. I don't have personal experience so could be wrong.

ADD REPLY • link 4.6 years ago by GenoMax 147k

0

Entering edit mode

Thanks to everyone! I have found how to fix the problem, the issue was indeed with the mapping.

ADD REPLY • link 4.6 years ago by tanya_fiskur ▴ 70

score 2 · Accepted Answer · 2020-04-10

2

Entering edit mode

4.6 years ago

swbarnes2 14k

The first thing you posted looks like the header of a proper sam file. The second thing superficially looks like one of the non-header lines, but the CIGAR string is out of control. Something was done wrong there, even if it superficially "worked". You should post all the command lines you used.

ADD COMMENT • link 4.6 years ago by swbarnes2 14k

1

Entering edit mode

Yes, right. If you sam output file only have header, that means something wrong with your mapping. Maybe input data format, maybe mapping command.

ADD REPLY • link 4.6 years ago by coolliuteng ▴ 10