BWA Version : 0.7.0-r313
Picard Version 1.108
samtools Version : 0.1.19-44428cd
1.I have created the bam index files for human_hg19_full.fasta using bwa -a bwtsw human_hg19_full.fasta
comand
This creates the following files
a. human_hg19_full.fasta Size (3199905534)
b. human_hg19_full.fasta.amb Size (8606)
c. human_hg19_full.fasta.ann Size (3762)
d. human_hg19_full.fasta.bwt Size (3137161344)
e. human_hg19_full.fasta.pac Size (784290317)
f. human_hg19_full.fasta.sa Size (1568580680)
2.Created the fai using samtools faidx command ./samtools faidx /kfs/babraham/benchmarks/Align/latest/human_hg19_full.fasta
which creates the following file
a. human_hg19_full.fasta.fai Size (3261)
3.Created the dict file using the picard tool using the following command java -Xmx48g -jar /kfs/aligncall/picard-tools-1.98/CreateSequenceDictionary.jar REFERENCE=/kfs/babraham/benchmarks/Align/latest/human_hg19_full.fasta OUTPUT=/kfs/babraham/benchmarks/Align/latest/human_hg19_full.dict
which creates the dict file
a. human_hg19_full.dict Size (12318)
4.run align and sampe using the following command:
./bwa mem -t 32 -R "@RG\tID:NA12878_30x_pairend\tSM:NA12878_30x\tLB:NA12878_30x\tPL:Illumina" /kfs/babraham/benchmarks/Align/human_hg19_full.fasta \
/kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe1.fastq.gz \
/kfs/aligncall/data/illumina_platinum/NA12878_30x_2-pe2.fastq.gz \
/kfs/babraham/benchmarks/Align/NA12878_30x_2-pe.sam
which creates the NA12878_30x_2-pe.sam created successfully.
5.cat NA12878_30x_2-pe.sam
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
...
@SQ SN:Un_gl000249 LN:38502
@RG ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina
6.Execute samtools view command mentioned below to convert above sam file to bam format.
samtools view -bhS /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.sam > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam
on executing the above command I am getting the following exception mentioned below
[root@kdev01 samtools-0.1.19]# ./samtools view -bhS /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.sam > /kfs/babraham/benchmarks/Align/latest/NA12878_30x_2-pe.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina
' is recognized as '*'.
[main_samview] truncated file.
I've taken the liberty of reformatting things a bit. I'm pretty sure I kept all of the context, but let me know or go ahead and change things if not.
Also, I saved the original header as
foo.sam
andsamtools view -Sbo foo.bam foo.sam
works fine. The error message indicates that samtools thinks the @RG line is a read, though I don't know how it got that idea. You might post the first 1000 lines somewhere (not here) so that we have a reproducible example to look at.wait... your file is just the sam header? The last line is
@RG ID:NA12878_30x_pairend SM:NA12878_30x LB:NA12878_30x PL:Illumina
?Looks like the headers are only present in the SAM file
When I run bwa aln and bwa sampe as separate commands on a different machine, the SAM file that is getting does have information further to this.
ok.... let's take a step back.
1) Why does your mapping spit out only the header ? 2) Why do you care about compressing a mere header ?
Looks like this might be a resource problem with the machine.
The machine which runs bwa mem has only 8 GB of RAM. I have increased this to 16 GB and I am re-executing bwa mem command