Arioc (read mapping) ref sequence length error
0
0
Entering edit mode
2 days ago
Michael • 0

I am really impressed with the speed increase in the GPU-enabled read mapper, Arioc.

However, I am finding a discrepancy between the length (nucleotides) of the input FASTA records (reference genome, whether multifasta or single fasta files), and the reported length of the same records after Arioc encoding. This is preventing use of the ultimate SAM/BAM files in downstream applications (e.g. GATK).

I can run the Scerevisiae example files as provided with the Arioc download, and the reported lengths are correct. I have used these example .cfg files as a strict template with my own FASTA files, but each of the FASTA records in the output shows the same (truncated) length of 10485759. I have also tried many other configurations, but all give the same LN=10485759.

Is 10485759 the maximum length of FASTA record that can be read? Has anyone else encountered this problem?

My input fasta files seem pretty standard, and can be read correctly by many other programs.

Here are lengths of the input records (in nucleotides):

Chr01   215687109
Chr02   188126098
Chr03   185291080
Chr04   165120918
Chr05   191020454
Chr06   195786439
Chr07   160739793
Chr08   226883875
Chr09   211202930
Chr10   184451305
Chr11   182988052
Chr12   176693890
Chr13   163306629
Chr14   158828433

and the output (.cfg) file

<?xml version="1.0" encoding="UTF-8"?>
<SAM fn="hsi20_0_30">
    <HD VN="1.6"/>
    <SQ srcId="0" subId="001" rm="Chr01" UR="" LN="10485759" AS="S288C" M5="7ed4be27dbb7bf131f73730e8afe875f" SN="Chr01"/>
    <SQ srcId="0" subId="002" rm="Chr02" UR="" LN="10485759" AS="S288C" M5="6c44c5d5c83d9678b3983047bdba5778" SN="Chr02"/>
    <SQ srcId="0" subId="003" rm="Chr03" UR="" LN="10485759" AS="S288C" M5="8d1130af9c660807090cc2a07ce38dea" SN="Chr03"/>
    <SQ srcId="0" subId="004" rm="Chr04" UR="" LN="10485759" AS="S288C" M5="851abd8f550924d33f914215c46c37fc" SN="Chr04"/>
    <SQ srcId="0" subId="005" rm="Chr05" UR="" LN="10485759" AS="S288C" M5="f61292522bc376c2d306b14e11fc4bc1" SN="Chr05"/>
    <SQ srcId="0" subId="006" rm="Chr06" UR="" LN="10485759" AS="S288C" M5="5b50426ce0a09437abbd424bc3ea08f9" SN="Chr06"/>
    <SQ srcId="0" subId="007" rm="Chr07" UR="" LN="10485759" AS="S288C" M5="8fdbf362f722ef81e7c89c4d1a165474" SN="Chr07"/>
    <SQ srcId="0" subId="008" rm="Chr08" UR="" LN="10485759" AS="S288C" M5="f95125c51c6f00ac4ac16215f6636fb8" SN="Chr08"/>
    <SQ srcId="0" subId="009" rm="Chr09" UR="" LN="10485759" AS="S288C" M5="3733588cc77e79e2a73cd2af4c7b5059" SN="Chr09"/>
    <SQ srcId="0" subId="010" rm="Chr10" UR="" LN="10485759" AS="S288C" M5="9500cde51e37d1e7c09a17403b38f9d4" SN="Chr10"/>
    <SQ srcId="0" subId="011" rm="Chr11" UR="" LN="10485759" AS="S288C" M5="e4ac83591c85946aaa91fef9f5e78179" SN="Chr11"/>
    <SQ srcId="0" subId="012" rm="Chr12" UR="" LN="10485759" AS="S288C" M5="c1abdb1d942a8deafb1eb04111ea28d3" SN="Chr12"/>
    <SQ srcId="0" subId="013" rm="Chr13" UR="" LN="10485759" AS="S288C" M5="a213ea02435b2da8aec958f10324d86c" SN="Chr13"/>
    <SQ srcId="0" subId="014" rm="Chr14" UR="" LN="10485759" AS="S288C" M5="d0e441107536881d402aae13edc47e30" SN="Chr14"/>
    <PG ID="AriocE (hsi20_0_30)" PN="AriocE" VN="1.52.3149.25006" CL="/home/michdeyh/250324_Calaug/AriocE.gapped.cfg" dt="2025-03-23T19:52:02" ms="149637" mJ="*"/>
</SAM>
mapping gpu arioc read • 196 views
ADD COMMENT
0
Entering edit mode

I have used these example .cfg files as a strict template with my own FASTA files,

A quick look at the example configuration file for AriocE appears to show that you don't need to provide lengths of sequence (which the aligner should figure out on its own). Not sure what you are doing above.

arioc

ADD REPLY
0
Entering edit mode

Thanks for looking into that!

Yes, the aligner does indeed calculate the lengths. FYI, the .cfg file I posted above was produced by the aligner -- it makes its own .cfg files as part of the process -- so I am showing how it is somehow miscalculating the actual length of the contigs. Cheers!

ADD REPLY

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6