How can I save the output from samtools and awk as a bam file?
7
0
Entering edit mode
8.3 years ago
GK1610 ▴ 120

My input file .bam files with mapped reads is bam1 from step 1

Step 1 samtools view -F 0X4 > **bam1**

Step 2 I want to keep only those reads which are paired and mapped. so I got this flag no from here

samtools view **bam1** | awk '{ if ($2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 || $2 == 67 || $2 == 131 || $2 == 115 || $2 ==179 && $9 >= -10000 && $9 <= 10000) > **bam2**

I have 20 bam2s from different samples and I want to merge them but I am getting this error bamtools merge ERROR: could not open input BAM file(s)... Aborting.

Step 3 I then redid step 2

samtools view -H **bam1** > bam2

samtools view **bam1** | awk '{ if ($2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 || $2 == 67 || $2 == 131 || $2 == 115 || $2 ==179 && $9 >= -10000 && $9 <= 10000) > **bam2**

samtools view -bS t**bam2** >  **newbam2**

**[E::sam_parse1] missing SAM header
[W::sam_read1] parse error at line 1
[main_samview] truncated file.**

how can I resolve this issue?

ChIP-Seq • 7.2k views
ADD COMMENT
0
Entering edit mode

Where did you create a file called tbam2?

Also, if there is a header in bam1, surely the awk after the pipe removes it. Perhaps you should add another if statement there to deal with header lines.

Also, for more clarity, please use four spaces before commands or the "code sample" function.

Also, in your awk, do you think

&& $9 >= -10000 && $9 <= 10000

affects all the "ors" or just the last one?

$ cat test.file
1   1
2   2
$ awk 'BEGIN{OFS=FS="\t"}{if($1==1 || $1==2 && $2==2){print}}' test.file
1   1
2   2
$ awk 'BEGIN{OFS=FS="\t"}{if(($1==1 || $1==2) && $2==2){print}}' test.file
2   2
ADD REPLY
5
Entering edit mode
8.3 years ago

Sometimes it's easier to write a little python script:

#/usr/bin/env python
import pysam

bam = pysam.AlignmentFile("your file.bam")
output = pysam.AlignmentFile("your output.bam", template=bam)
for read in bam.fetch():
    if read.is_paired and not read.is_unmapped and not read.mate_is_unmapped and abs(read.template_length) < 10000:
        output.write(read)
bam.close()
output.close()

I haven't actually run that, so there may be a typo, but that'll be rather more convenient than a multistep samtools->awk->samtools thing.

ADD COMMENT
0
Entering edit mode
import pysam
bf = pysam.AlignmentFile(fname, "rb")
output = pysam.AlignmentFile("output.bam", "w", template=bf)
for read in bf.fetch(until_eof=True):
if read.is_paired and not read.is_unmapped and not read.mate_is_unmapped and abs(read.template_length) < 10000:
output.write(read)
bf.close()
output.close()

I am getting this error 
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "pysam/calignmentfile.pyx", line 1345, in pysam.calignmentfile.AlignmentFile.write (pysam/calignmentfile.c:14774)
File "pysam/calignmentfile.pyx", line 1374, in pysam.calignmentfile.AlignmentFile.write (pysam/calignmentfile.c:14696)
ValueError: sam write failed

do you know what could have been wrong?

ADD REPLY
1
Entering edit mode

Presumably 'wb' rather than 'w' on the third line.

ADD REPLY
2
Entering edit mode
8.3 years ago

using samjs : https://github.com/lindenb/jvarkit/wiki/SamJS

java -jar dist/samjs.jar  -e 'record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && Math.abs(record.getInferredInsertSize())< 10000'   -o out.bam in.bam
ADD COMMENT
1
Entering edit mode

That's pretty neat...

ADD REPLY
0
Entering edit mode

Pierre, out of curiosity, can you comment on the use of JavaScript for bioinformatics tasks like this one? What advantages does it have over a Java or python solution? I know nothing about Javascript and it's not a language we see very often around here...

ADD REPLY
1
Entering edit mode

, can you comment on the use of JavaScript for bioinformatics tasks

sure. Java contains a javascript engine .

One can insert any java object into a javascript context; Hence, any class that have been written for the java library for HTS (htsjdk) can be used as a simple javascript object without writing a new specific js API from scratch and we can stick to the BAM specification.

ADD REPLY
0
Entering edit mode

Thanks Pierre

One can insert any java object into a javascript context

But why would you want to do that in the first place? Why not just write a pure Java program? Is it because JavaScript is easier to develop than Java, a bit like Scala?

ADD REPLY
1
Entering edit mode

Why not just write a pure Java program?

sounds like "why writing an awk script when you could write a C program" :-)

it's faster to write and run ( No need to create a source code + no need to compile). But the job will take more time than a pure compiled java program.

ADD REPLY
2
Entering edit mode
8.3 years ago
GK1610 ▴ 120

solution

`samtools view -h **input.bam** | awk '$1~"@" || $2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 '| awk '$1~"@" || $9 >= -10000 && $9 <= 10000'| samtools view -Shb - > **output.bam**

`java -jar dist/samjs.jar  -e record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && Math.abs(record.getInferredInsertSize())< 10000'   -o out.bam in.bam
ADD COMMENT
1
Entering edit mode
6.5 years ago
samtools view --threads 2 -h in.bam | awk '{if($3 != "chrM" && $3 != "chrUn"){print $0}}' | samtools view --threads 2  -Shb - > out.bam
ADD COMMENT
0
Entering edit mode
awk -F '\t' '($0 ~/^@/ || !($3=="chrM" || $3=="chrUn"))'
ADD REPLY
0
Entering edit mode
8.3 years ago
GK1610 ▴ 120

okay I really want this to work

suggestion1. pysam doesnt work

suggestion2. picard--doesnt solve the problem of removing reads and mates with fragment length >10KB

all I want is to convert samtools view bam file + awk operations ---> readable bam file

samtools view bam1 | awk '{ if ($2 == 99 || $2 == 147 || $2 == 83 || $2 == 163 || $2 == 67 || $2 == 131 || $2 == 115 || $2 ==179 && $9 >= -10000 && $9 <= 10000) > "**readable bam file**"
ADD COMMENT
0
Entering edit mode

doesnt solve the problem of removing reads and mates with fragment length >10KB

updated my code with :

&& Math.abs(record.getInferredInsertSize())< 10000

...

ADD REPLY
0
Entering edit mode
8.3 years ago

For one thing, samtools view something.bam is going to output a sam, not a bam. You should be able to confirm this by just looking at it.

Second, you should only need those first 4 flag values, I do't think you can have a properly paired pair where both are in the same direction.

Third, you need to use >> to write in append mode. Your second command that has > bam2 is over-writing the results from the first one.

ADD COMMENT
0
Entering edit mode

Second, you should only need those first 4 flag values, I do't think you can have a properly paired pair where both are in the same direction.

Así es la vida : Tophat : Sam-Flag 115 = Properly-Paired + Read.Reverse + Mate.Reverse ?

ADD REPLY
0
Entering edit mode
6.5 years ago

Am being dense? What's wrong with:

samtools -F 12 -F 3 -bh input.bam  > bam2.bam

explaination:

-F excludes reads with any of the bits set. 12 is 0x4 + 0x8: Read is unmapped, mate is unmapped

-f includes only reads with all the bits set. 2 is 0x1 + 0x1: Read is paired. Read is in proper pair.

-b is output bam (rather than the default sam)

-h is output header.

Still would of course depend on your mapper setting the proper paired flag properly, but most do right?

ADD COMMENT

Login before adding your answer.

Traffic: 1631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6