samtools sort
1
0
Entering edit mode
2.9 years ago
Javier • 0

I am transforming sam files to bam, to facilitate their ordering I use this command,

% cd /Volumes/GENOMA/BWA 
% samtools sort -n -O V350019555_L03_B5GHUMqcnrRAABA-551.sam | samtools fixmate -m -O bam V350019555_L03_B5GHUMqcnrRAABA-551.bam

but it gives me the following error,

As elsewhere in samtools, use '-' as the filename for stdin/stdout. The input file must be grouped by read name (e.g. sorted by name). Coordinated sorted input is not accepted. (base) javier@iMac-de-JAVIER BWA %

I use the version of samtools 1.10

sort ordering • 3.7k views
ADD COMMENT
0
Entering edit mode
2.9 years ago
samtools sort -n -l 0  -O BAM  -T tmp in.sam |  samtools fixmate -m -O BAM  -  out.bam
ADD COMMENT
0
Entering edit mode

I have done what you have told me and it has generated 24 files in bam format between 25 and 240 MB. But what interests me is having only one. Wouldn't each file be a chromosome? What do I have to do to have a single file?

ADD REPLY
0
Entering edit mode

your pipeline crashed. You're looking at the temporary files.

ADD REPLY
0
Entering edit mode

and what do I have to do? what can it be due to?

ADD REPLY
0
Entering edit mode

You will need to restart the job. Delete the .tmp files before you restart the job. If it fails agains then you either do not have enough RAM or disk space to complete this job.

ADD REPLY
0
Entering edit mode

I'm doing what you tell me. 32 bam files are generated, but after about three minutes, the 32 bam files are deleted, showing this message in terminal

(base) javier@iMac-de-JAVIER BWA % samtools sort -n -l 0 -O BAM -T tmp V350019555_L03_B5GHUMqcnrRAABA-551.sam | samtools fixmate -m -O BAM - V350019555_L03_B5GHUMqcnrRAABA-551.bam
[bam_sort_core] merging from 33 files and 1 in-memory blocks...
(base) javier@iMac-de-JAVIER BWA % 
ADD REPLY
0
Entering edit mode

That means the job was done. You should now have a sorted BAM file with mates fixed. What do you see with

ls -lh V350019555_L03_B5GHUMqcnrRAABA-551.bam 

and then for this command

samtools view -H V350019555_L03_B5GHUMqcnrRAABA-551.bam | head -4
ADD REPLY
0
Entering edit mode

If I actually do what you say, and the sequences grouped into 3 chromosomes appear. I don't know why in 3, and not in all because I thought that the readings of the entire file are of the entire genome

Now I try to remove the duplicate reads, and I get this error

(base) javier@iMac-de-JAVIER BWA % samtools markdup -r -S V350019555_L03_B5GHUMqcnrRAABA-551.bam V350019555_L03_B5GHUMqcnrRAABA-551.bam

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
[markdup] error: truncated input file.
(base) javier@iMac-de-JAVIER BWA % 
ADD REPLY
0
Entering edit mode

If I actually do what you say, and the sequences grouped into 3 chromosomes appear. I don't know why in 3, and not in all because I thought that the readings of the entire file are of the entire genome

That is because I asked you to provide only top 4 lines in output by using the head -4 command.

Providing a word description does not help us with debugging the problem. Please show the output of the commands I had asked for in previous comment.

ADD REPLY
0
Entering edit mode

This is that I watch in the terminal

@HD VN:1.6  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559

What is LN?

ADD REPLY
0
Entering edit mode

LN is length of that chromosome in bases. So it does look like you have a co-ordinate sorted BAM file.

samtools markdup -r -S V350019555_L03_B5GHUMqcnrRAABA-551.bam V350019555_L03_B5GHUMqcnrRAABA-551.bam

Do I see the same name in that command line for both input and output? You can't use the same name for input and output. Assuming you have not corrupted BAM file use a different output name for samtools markdup command for output. If the file has become corrupt you will need to re-do the previous step.

ADD REPLY
0
Entering edit mode

If indeed it was what you said, and I have given it another name

I have not yet downloaded it in igv, but I have done a quality control, but no values appear

what do you think? keep in mind, that sol is a file and I have to merge it with 7 others, (by the way, I already know how to merge, it works the way you said the other day)

(base) javier@iMac-de-JAVIER BWA % samtools flagstat 3C.bam
0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5
ADD REPLY
0
Entering edit mode

Unfortunately flagstat output is no good. It is saying you have nothing in your BAM file. Go backwards (if needed all the way to the original fastq files) and check every file for size. Make sure the file sizes are in line with amount of data. Somewhere along the way something failed and you have been dragging empty data files along.

ADD REPLY
0
Entering edit mode

It generates some files, but they are deleted How do I get a bam file?

_L03_B5GHUMqcnrRAABA-551.sam | samtools fixmate -m -O BAM - V350019555_L03_B5GHUMqcnrRAABA-551.bam

[bam_sort_core] merging from 33 files and 1 in-memory blocks...

ADD REPLY
0
Entering edit mode

My advice still stands. Check every step of your analysis. Snippets of logs you are posting in your comments seem to indicate the right outcome but since we can't see your data/datafiles you are the only one who can sort this out.

ADD REPLY

Login before adding your answer.

Traffic: 1282 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6