Official announcement:
Samtools Release 1.5 [Solstice Release] (21st June 2017)
- Samtools fastq now has a -i option to create a fastq file from an index tag, and a -T option (similar to -t) to add user specified aux tags to the fastq header line.
- Samtools fastq can now create compressed fastq files, by giving the output filenames an extention of .gq, .bgz, or .bgzf
- Samtools sort has a -t TAG option, that allows records to be sorted by the value of the specified aux tag, then by position or name. Merge gets a similar option, allowing files sorted this way to be merged.
Let's go over each item and see how it works in practice.
Samtools fastq now has a -i option to create a fastq file from an index tag, and a -T option (similar to -t) to add user specified aux tags to the fastq header line.
Help flags for samtools fastq
:
...
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
-T TAGLIST copy arbitrary tags to the FASTQ header line
...
Let's give it a go. Get the test file.
curl https://raw.githubusercontent.com/samtools/samtools/develop/test/dat/bam2fq.005.sam > test.sam
now run:
samtools fastq test.sam | head -4
it prints
@ref1_grp1_p001/1
CGAGCTCGGT
+
!!!!!!!!!!
whereas:
samtools fastq -T MD,BC,za test.sam | head -4
prints:
@ref1_grp1_p001/1 MD:Z:10 BC:Z:AC-GT za:Z:Hello world!
CGAGCTCGGT
+
!!!!!!!!!!
The -i
flag is poorly documented and I managed to figure it out only by scouring the test examples on the GitHub repository. It requires setting the --index-format
parameter and a file specified via the --i1
parameter to collect the indices into.
samtools fastq -i --i1 indices.fq --index-format 'i2' -T MD,BC,za test.sam | head -4
will produce:
@ref1_grp1_p001/1 MD:Z:10 BC:Z:AC-GT za:Z:Hello world! 1:N:0:AC
CGAGCTCGGT
+
!!!!!!!!!!
and a file called indices.fq
that contains:
@ref1_grp1_p001/1 MD:Z:10 BC:Z:AC-GT za:Z:Hello world! 1:N:0:AC
AC
+
""
@ref1_grp1_p002/1 MD:Z:10 BC:Z:AATT+CCGG za:Z:Another string 1:N:0:AA
AA
+
""
@ref1_grp2_p001/1 MD:Z:8 BC:Z:TG+CA za:Z:!"$%^&*() 1:N:0:TG
TG
+
ab
Samtools fastq can now create compressed fastq files, by giving the output filenames an extention of .gq, .bgz, or .bgzf
Example:
samtools fastq -1 read1.fq.gz -2 read2.fq.gz align.bam
The release note is a bit confusing though.It is not clear what the .qg
extension above means.Perhaps a typo for .gz
since that works as well as demonstrated above. Also unclear is the difference between .bgz
and bgzf
.
Samtools sort has a -t TAG option, that allows records to be sorted by the value of the specified aux tag, then by position or name. Merge gets a similar option, allowing files sorted this way to be merged.
samtools sort align.bam | samtools view | cut -f 1,12-25 | head -5
prints:
SRR343051.887 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,-55728,101M,0;
SRR343051.542 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,-55615,101M,0;
SRR343051.9863 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,-55587,101M,0;
SRR343051.887 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,+55573,101M,0;
SRR343051.9863 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,+55479,101M,0;
whereas:
samtools sort -t AS align.bam | samtools view | cut -f 1,12-25 | head -5
prints:
SRR343051.1909 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.5040 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.22 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.2588 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.3324 AS:i:0 XS:i:0 RG:Z:foo
Why is this post labeled "forum" and not "news"?
Or 'tutorial', maybe?
I was not sure what it ought to be. News would fit if there was just the initial statement. A tutorial label felt like giving it too much importance. Either way would work.
Posts labeled with "forum" have some component that warrants discussion/generates opposing opinions (in my mind). Since you are demonstrating some of the features with example data tutorial may fit better.
I'll make it a tutorial then.