How to convert mapping information to SAM format
1
0
Entering edit mode
10.2 years ago
bitjunkie ▴ 40

I have a mapping tool I wrote that outputs mapping formation in a tab delimited format. Unfortunately, my tool does not output in SAM format. I want to use of other tools that require SAM files to analyze my output. I was wondering if there is a command line tool that will take mapping formation as arguments and out put in SAM format. Its fine if I have to do this one mapping at a time. Just a matter of writing a script.

Yes, I could learn everything about SAM format and write code to do this but that takes brain power and time.

Thanks.

sequencing alignment next-gen • 2.9k views
ADD COMMENT
2
Entering edit mode

If you already have the mapping information computed, then it seems like it would be less work and time to output it to the SAM text format (which isn't all that complicated compared to writing a mapping algorithm), rather than finding an existing tool that will translate text information to SAM text and then tailoring your output to that tool's requirements. All you need to do is take a look at the SAM specification and work your way through it column-by-column.

One thing you could do is use samtools view to sanity-check your output. You could pretty easily make a unit-testing jig like this, which would help you work out the kinks in your output.

ADD REPLY
0
Entering edit mode

Thanks Deedee. That's not the answer I was hoping for. You kind of convinced me that it might not be that complicated. My mapping information is kind of a special case, however, since it is alignment free, i.e. I don't use an alignment algorithm as a base method. Basically, my mapping information consists of "x read maps to y location with z mismatches". I don't know what that would mean for mapping quality and CIRGAR strings, for example.

ADD REPLY
1
Entering edit mode

I think Pierre has a great approach as well. Depending on what you want to accomplish, you can fudge some of the data. For example, you could just put [number of bases]M as the CIGAR string for every mapped read (indicating that every base was a perfect match), and have the MAPQ be the same for every mapped read as well. This is essentially what bed2bam is doing. In the end, if you don't care about the accuracy for the majority of the fields, you should just output to BED (which is far simpler) and then use bed2bam.

I should mention that based on what you said, you should be able to easily construct CIGAR strings. CIGAR is pretty simple: 10M1X39M means that, for the given read, the first ten bases matched the reference, the eleventh base mismatched, and the next 39 matched.

ADD REPLY
0
Entering edit mode

Ah I see. Looks like I have to do a little reading regarding bedtools, bed format, and SAM format. Thanks people.

ADD REPLY
2
Entering edit mode
10.2 years ago

see bed2bam for an inspiration:

bedToBam converts features in a feature file to BAM format. This is useful as an efficient means of storing large genome annotations in a compact, indexed format for visualization purposes.

ADD COMMENT
0
Entering edit mode

I don't have a feature file? Are you talking about genome feature files (.gff)? Can I use this method if my information is equivalent to "x read maps to y location with z mismatches"?

ADD REPLY
1
Entering edit mode

BED is a really simple tab-delimited format. For each mapped read:

column 1: chromosome name

column 2: mapping start position

column 3: mapping end position

Those are all that's required for BED. You can add other columns as well, but outputting legit BED is dead simple.

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6