samtools collate
1
0
Entering edit mode
17 months ago
Sara ▴ 30

Hi all,

I am using samtools collate to convert my bam files to paired end fastq files. here is the command that I am using

samtools view -h -T mm10.fa {input.bam} | samtools collate -O -u -@ {threads} - | samtools fastq -1 output_paired1.fq.gz  -2 output_paired2.fq.gz -0 /dev/null -s /dev/null - 2>>{log}

but when I am running it shows me the usage of samtools collate is not correct

Usage:   samtools collate [-Ou] [-n nFiles] [-l cLevel] <in.bam> <out.prefix>

Options:
      -O       output to stdout
      -u       uncompressed BAM output
      -l INT   compression level [1]
      -n INT   number of temporary files [64]
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
      --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]

I could not realize how to correct my command line. is the prefix is matter? but it is optional I would be grateful if anyone can help me.

Thank you in advance!

samtools • 3.2k views
ADD COMMENT
0
Entering edit mode

it shows me the usage of samtools collate is not correct

What is the exact error message?

ADD REPLY
0
Entering edit mode

there is no error massage it just give me

Usage:   samtools collate [-Ou] [-n nFiles] [-l cLevel] <in.bam> <out.prefix>

Options:
      -O       output to stdout
      -u       uncompressed BAM output
      -l INT   compression level [1]
      -n INT   number of temporary files [64]
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
      --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 0 reads
ADD REPLY
0
Entering edit mode

What is the output to samtools view {input.bam} | head?

ADD REPLY
0
Entering edit mode

it displays a few lines of bam files

ADD REPLY
1
Entering edit mode

Try adding STDOUT at the end to the collate command like so:

| samtools collate -O -u -@ {threads} - STDOUT |
ADD REPLY
2
Entering edit mode

Any string will do, it just needs something to prefix tmp files. Also, you don't need the view command at all.

ADD REPLY
0
Entering edit mode

The usage doc needs to be better then - the angle brackets around the out prefix param is the only hint that points to the possibility that it could be required even when common sense says it's not. One would expect a tool built by lh3 to either specify this explicitly or use a default prefix when not provided one.

ADD REPLY
0
Entering edit mode

what does - STDOUT do? (which @Ram mentioned) is it like to do the prefix tmp files? and how and when do I have to realize to use STDOUT? for the view command, I wanted to use the ref.fa for proper interpretation of the alignment positions, isn't that necessary?

ADD REPLY
0
Entering edit mode

Reference is only needed with .cram format files.

ADD REPLY
0
Entering edit mode

what is 'proper interpretation of alignment positions'? no not necessary.

ADD REPLY
0
Entering edit mode

the first samtools view is useless

ADD REPLY
1
Entering edit mode
17 months ago
Sara ▴ 30

Thank you everyone for all the help. it seemed that the problem was with a version of samtools (it was samtools 1.6 in the environment I used for samtools collate). with updating the samtools problem was solved.

Also, it is better to use PREFIX for temporary files

samtools collate -O -u -@ {threads} {input.bam} 2>{log} | samtools fastq -1 {output.R1} -2 {output.R2} -0 /dev/null -s /dev/null 2>>{log}
ADD COMMENT
1
Entering edit mode

While you are saying it is better to use PREFIX you do not show that in your example command line. Not sure why you need a log file for collate step?

ADD REPLY
1
Entering edit mode

Here is with using a prefix

samtools collate -O -u -@ {threads} {input.bam} -T {input.bam}_prefix 2>{log}| samtools fastq -1 {output.R1} -2 {output.R2} -0 /dev/null -s /dev/null 2>>{log}

I used the log to check errors(I used Snakemake). for ex: I received the following error in my log file

samtools collate: Couldn't write to intermediate file "/tmp/collate70898.0002.bam"

So, then I added -T PREFIX for using the temporary files.

ADD REPLY
0
Entering edit mode

Yes, the prefix was made optional in >=1.8. See https://github.com/samtools/samtools/releases/tag/1.8

Please accept your answer to mark the post as solved.

ADD REPLY

Login before adding your answer.

Traffic: 2110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6