Entering edit mode
6.9 years ago
pmarijon
▴
140
Hi,
I want read a sequence file (fasta fastq bam, etc), so I read Seqan tutorial. But If I want know my position in file I need use std::ifstream (for generate a progress bar) , it's not a problem, I write this test code:
#include <iostream>
#include <fstream>
#include <seqan/seq_io.h>
int main (int argc, char ** argv) {
std::streampos begin,end;
std::ifstream myfile (argv[1], std::ios::in | std::ios::binary);
begin = myfile.tellg();
seqan::SeqFileIn seq_file(myfile);
seqan::CharString id;
seqan::Dna5String seq;
seqan::CharString qual;
while(!seqan::atEnd(seq_file))
{
seqan::readRecord(id, seq, qual, seq_file);
std::cout<<"pos: "<<myfile.tellg()<<" id "<<id<<std::endl;
}
end = myfile.tellg();
myfile.close();
std::cout << "begin: "<< begin << " end: "<< end << std::endl;
std::cout << "size is: " << (end-begin) << " bytes.\n"<<std::endl;
return 0;
}
But when I try this code on compressed fastq read, Seqan throw an exception terminate called after throwing an instance of 'seqan::ParseError'
My question :
- Use std::ifstream is the only solution to get the current position in file ?
- How I can say to Seqan this stream are a compressed stream ?
- Can I generate an uncompressed stream from my compressed stream (with SeqAn or zlib)
Thanks.
why would you want to know the position of a fastq record in a compressed file ? unless you're using bgzf, there is no way to 'fseek ' a bgzip file...
I want generate a progress bar, the post required an edit. For compress file we can have a good approximation with size of compressed file and the position in compressed file.
then I would create a custom std::streambuf to count the number of bytes... e.g: https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/
I use a std::ifstream to get current position in file during seqan parsing, it's easy. But when I try my code on compressed file, seqan parsing failed. So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.
Usually it is the other way round: Things don't work on compressed data, unless documented.
Is documented
Source : https://seqan.readthedocs.io/en/master/Tutorial/InputOutput/SequenceIO.html
Well, there is compressed
.bam
and compressed.gz
.And bz2 according to http://docs.seqan.de/seqan/master/group_FileCompressionTags.html#FileCompressionTags%23BgzFile