Unzipping SAM files
0
0
Entering edit mode
4.2 years ago
storm1907 ▴ 30

Hi, Is there a way to get SAM files into readable FASTA format without samtools? I have a bunch of SAM files, and I need to read header and at least first lines of them. SAM files are gunzipped files, but gunzipping is not working with them.

next-gen software error • 2.6k views
ADD COMMENT
1
Entering edit mode

SAM files are gunzipped files but gunzipping is not working with them

SAM files are not "gunzipped". Also, if a file is already decompressed, decompressing again won't work.

ADD REPLY
0
Entering edit mode

Do you want to make a SAM become a FASTA?

Did you gzip the SAM files? It would be more convenient if they were converted to a BAM file.

check http://seqanswers.com/forums/showthread.php?t=6169

What you want might be done by:

cat filename.sam | awk '{OFS="\t"; print ">"$1"\n"$10}' - > filename.fasta
ADD REPLY
0
Entering edit mode

Hmm, I tried this, and got text file with symbols like these

>^_�^H^@�ZN_^@^C��[sd[��

>^�;i�q^? �x|^V2��폗 ��^_�字���p���㭗��᭗字字��r|= oA�p��?ޭ>=�ڭ^Sۭ�<

>���t@Q�q\]��^V��n^Rz��òز,�+ٲ˲b9���X~션�/Wk9�兘VAՅ慘�?5^Z�^Rv!�^Q^G
ADD REPLY
0
Entering edit mode

What do you see when you try file <your.bam> in the command line? If it happens to be a BAM file, you will see gzip compressed data, extra field whereas it will show ASCII text, with very long lines for a SAM file. You will get a clue as to what kind of compression it is if it's neither.

ADD REPLY
0
Entering edit mode

The thing is that I am not even able to get bam files from sam with samtools. My theory is that bwa mem somehow changes the header in sam, therefore I need to see sam file.

ADD REPLY
0
Entering edit mode

This is the header of one of my PE files after adapter trimming. Looks OK at the moment.

@M06108:35:000000000-J52R8:1:1101:10940:1268 1:N:0:21
CCCATTTCAAATCCTGTAAATCGGATAACAGTGCAAGTACAAACCTACCTCNCTTNGTTGTGTTGNAG
+
CCCCCGCGFGGGFFGGGGGGGGGD@FGGGGGF<CFGFFGGFGGGGGGGGGG#:D@#::DDDCGDF#,6
@M06108:35:000000000-J52R8:1:1101:12586:1268 1:N:0:21
AAATGCTTCAGATGATGACGCATTCACATTAGTAACAAAGGCTGTCCACCANGCG

But after BWA, I get error message from Samtools about "failed to read header"

ADD REPLY
0
Entering edit mode

what is the BWA command you are running?

ADD REPLY
0
Entering edit mode
#!/bin/bash -x


INPATH=dir
OUTPATH=dir

reference=refseq.fasta
cd //tools/bwa-0.7.17
./bwa index $reference
for file1 in $INPATH/*.fastq.gz ;
do
        bname=$(basename $file1 )
        sample=$( cut -d'_' -f 1,2,3<<<"$bname")
        echo $bname $sample
        input1=$INPATH/$sample"_R1_001.paired.fastq.gz"
        output1=$OUTPATH/$sample"_R1_001.paired.sai"
        input2=$INPATH/$sample"_R2_001.paired.fastq.gz"
        output2=$OUTPATH/$sample"_R2_001.paired.sai"
        outfile=$OUTPATH/$sample".sam"
        echo $input1 $output1 $input2 $output2
        ./bwa mem -t 8 $reference $input1 $input2 | gzip -3 > $outfile
        echo $outfile
done
ADD REPLY
0
Entering edit mode
gzip -3

Why are you doing this? You are gzipping your file no wonder you get those binary characters. You can directly feed the output of bwa mem to samtools to generate the sorted BAM files.

bwa mem ref.fa read1.fq read2.fq | samtools sort -o sortedbam.bam -
ADD REPLY
0
Entering edit mode

Yep, I removed it from code. Now it looks like BWA MEM is adding filename before header

>@SQ

>@PG
/dir/subdir/file.paired.fastq.gz
>M06108:35:000000000-J52R8:1:1101:10027:1270
AGTCTTGTAAAACCTTCTTTTTACNTTTACTCTCGTGTTAAAAATCTGAATTCTTCTAGAGTTCCTGATCTTCTGG
>M06108:35:000000000-J52R8:1:1101:10027:1270
ACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATC
ADD REPLY
0
Entering edit mode

This does not look right. You should get an output that is in SAM format (if you just removed the gzip). Page 2 in SAM format spec.

ADD REPLY
0
Entering edit mode

Problem solved. -R option is necessary in BWA MEM - to add read groups header (important for samtools), otherwise I could not get any further

ADD REPLY

Login before adding your answer.

Traffic: 2538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6