Choosing sam flags to extract all reads mapped to 5' -> 3' OR 3' -> 5' of reference sequence?
3
0
Entering edit mode
7.1 years ago
johnnytam100 ▴ 110

I am trying to extract all the reads which : 1) mapped to 5' -> 3' OR 3' -> 5' of reference sequence AND 2) paired but one of the reads is unmapped.

To meet requirement 2), I read from here or here that the sam flags are 73, 133, 89, 121, 165, 181, 101, 117, 153, 185, 69 and 137.

However, to meet requirement 1), out of the flags stated above, what is the simple method to determine whether the reads are mapped to 5' -> 3' OR 3' -> 5' of the reference sequence?

I have also read from slide 32 of here but was quite difficult when I start to consider combinations.

Thanks a lot!!!

sam bwa flag mapping • 4.0k views
ADD COMMENT
0
Entering edit mode

Do you have any language you must use? Or it doesn't matter?

ADD REPLY
0
Entering edit mode

I work in a linux system so maybe it doesn't matter.

ADD REPLY
1
Entering edit mode
7.1 years ago

I don't think you can do this using a one-pass samtools.

using samjdk:

 java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag()));' input.bam
ADD COMMENT
0
Entering edit mode

How could I divide the reads into 5' -> 3' and 3' -> 5' direction using the script? Thank you so much!

ADD REPLY
1
Entering edit mode

BTW, thank you so much for sharing the slides, maybe that is one of the few explanations of the sam flags that visualize how the reads look like.

ADD REPLY
0
Entering edit mode

it's not clear if you want to divide the sam flags . Your question looks like `( (cond1 || cond2) && cond3)'.

ADD REPLY
0
Entering edit mode

Sorry for making it confusing. I am extracting 2 groups of reads : Group 1 : paired but one of the reads is unmapped AND mapped 5'->3' of reference. Group 2 : paired but one of the reads is unmapped AND mapped 3'->5' of reference.

ADD REPLY
1
Entering edit mode

negative strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' input.bam

positive strand:

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getMateNegativeStrandFlag()) || (!record.getReadUnmappedFlag() && record.getMateUnmappedFlag() && !record.getReadNegativeStrandFlag()));' input.bam
ADD REPLY
0
Entering edit mode

Great! You are of great help!!! After I compiled samjdk according to the download and compile section here and tried to run the command of negative strand, the following error appeared: Error: Invalid or corrupt jarfile. Any idea on how to fix this? Thanks again.

ADD REPLY
0
Entering edit mode

what was your exact command line please, and your java version.

ADD REPLY
0
Entering edit mode

Sorry for keeping you waiting. Here are the information :

Installation command :

git clone "https://github.com/lindenb/jvarkit.git"
cd jvarkit
make samjdk

Command to extract negative strand:

java -jar /data/tam/script/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/samjs/SamJdk.java
-e 'return record.getReadPairedFlag() &&((record.getReadUnmappedFlag() && () && record.getMateNegativeStrandFlag()) || (() && record.getMateUnmappedFlag() && record.getReadNegativeStrandFlag()));' /data/tam/genome/mapping/20171001_3TTN_TTNR1_SPadesTTNcontigs_mapped_TTNTTNRoriginalreads_defaultmismatch/libextract-defaultmismatch_TTN_footprint_only_3TTN_TTNR1_read_list_aln.sorted.bam

Java version:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
ADD REPLY
0
Entering edit mode
7.1 years ago
glihm ▴ 660

I strongly recommend you to read the documentation of SAMTOOLS and associated HTSlib. They are very powerful tools and you must understand them to go further in the SAM formatted files processing.

Solution 1: You have some time and you can make a list of all the flags you are interested in (You can use the tools you mentioned and also Picard Tool to explain SAM flags). So, using samtools view -F YOURLISTOFFLAGS input.bam you'll be able to extract the reads you want for.

Solution 2: You can use HTSlib in C (or in other languages like Python with Pysam or Java with samjdk mentioned by Pierre Lindenbaum). Is so, you have to focus your work on the BITWISE operators. They allow you a quicker and better way to acces all the flags you want to extract your reads of interest.

At a glance for the solution 2, if you don't care about the strandness (you want 5'->3' and 3'->5'), you don't have to test the strandness bit (0x10). So, the work is now simple, you just have to test if the read is paired (test on the bit 0x1) and if one of the two read is unmapped (bit 0x4 OR 0x8). You've done. :)

ADD COMMENT
0
Entering edit mode

Actually I care about strandness, maybe I haven't stated it clearly enough in the question. :) So do you mean reads with strandness bit (0x10) is one direction, and without strandness bit (0x10) is another direction?

ADD REPLY
0
Entering edit mode

Correct, if 0x10 is set (1), read is on reverse strand. Else (0), read is on forward strand. ;)

ADD REPLY
0
Entering edit mode
7.1 years ago
GenoMax 147k

Using splitsam.sh from BBMap suite.

splitsam.sh mapped.sam forward.sam reverse.sam
reformat.sh in=forward.sam out=plus.fq
reformat.sh in=reverse.sam out=minus.fq rcomp
ADD COMMENT
0
Entering edit mode

Thank you very much! Actually I only have one .sam per library mapped, do I have to divide them into separate .sam before using splitsam.sh?

ADD REPLY
0
Entering edit mode

Splitsam is doing the split for you into two separate sam files. You can then convert them back to fastq reads by the reformat command above.

ADD REPLY
0
Entering edit mode

So Splitsam will output forward.sam and reverse.sam. How about the plus.sam and minus.sam? Are they also input files? Thank you!

ADD REPLY
0
Entering edit mode

Post edited above.

ADD REPLY

Login before adding your answer.

Traffic: 1637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6