Extracting reads with low mapping score from Bam file
2
0
Entering edit mode
10.4 years ago

Is there a way to extract the reads with low mapping score from Bam files? Samtools will remove reads with low mapping score if we specify the threshold for option -q. However, what I want is to save those reads to another separate file instead of discarding them.

next-gen • 8.4k views
ADD COMMENT
4
Entering edit mode
10.4 years ago

You have a few options:

  • Code up something simple with pysam
  • Just use awk (e.g., samtools view foo.bam | awk '{if($5<10) {print $0}}' > foo.low_qual.sam)
ADD COMMENT
0
Entering edit mode

Thanks very much for your information.

Could I ask you a further question? In the foo.low_qual.sam file, there won't be headers from the original foo.bam file. If so, could I still use samtools to convert the foo.low_qual.sam file into foo.low_qual.bam in order to save space? Also, could I use samtools to generate pileup file from foo.low_qual.bam if there is no problem in converting .sam to .bam?

Thank you so much!

ADD REPLY
1
Entering edit mode

That was just an abbreviated example. You could include headers too:

samtools view -h foo.bam | awk '{if(NF<5) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -

I should note that that's approximate, I haven't ensured that it lacks typos.

ADD REPLY
0
Entering edit mode

The NF<5 criterium does not work to extract the headers. We know that each line of the Sam file header starts with @ character, can I use this information to extract the headers? If so, could you write down the command? I googled but haven't found useful information.

ADD REPLY
1
Entering edit mode

Odd, that should normally suffice. You can also just:

samtools view -h foo.bam | awk '{if($1 ~ /^@/) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -
ADD REPLY
0
Entering edit mode

shorter: awk '($0 ~ /^$@/ || int($5)<10)'

ADD REPLY
0
Entering edit mode

That actually works! Previously, I forgot to add the option -h.

Thanks a lot!

ADD REPLY
0
Entering edit mode

Hi Devon,

Could you explain what NF<5 means? Also, I think there is no difference using the two command line you provided. Is this correct? Obviously, I'd like to adopt the faster one.

ADD REPLY
0
Entering edit mode

NF is "the number of fields". Normally headers only have a small number of fields per-line.

ADD REPLY
1
Entering edit mode
10.4 years ago

using my tool samjs : https://github.com/lindenb/jvarkit/wiki/SamJS

java -jar dist/samjs.jar -e 'record.getMappingQuality() > 20' \
   -X discarded.bam -o accepted.bam in.bam
ADD COMMENT
0
Entering edit mode

Thank you, Pierre.

I would like to try your tool if the IT people in my school is willing to install your javakit for me.

ADD REPLY
0
Entering edit mode

Could I directly download your tool samjs.jar and run the command line you wrote? I checked that I could run java in my cluster. I actually did what you provide in the Download and Install, however, I couldn't run ant. I tried to find samjs.jar, but failed to find it.

Thanks,

ADD REPLY
0
Entering edit mode

What if I cannot use ant on the server? Could I use SamJavascript.java directly? I obtained SamJavascript.java from the github you provided in the link of compilation.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I have installed ant and done the following steps:

git clone "https://github.com/lindenb/jvarkit.git"
cd jvarkit
git submodule init
git submodule update
cd htsjdk
ant clean
ant
cd ..

However, I couldn't find the samjs.jar using the command find . -name "samjs.jar". Could you tell me where I could get it?

ADD REPLY
0
Entering edit mode

and then you have to `and samjs` https://github.com/lindenb/jvarkit/wiki/SamJS#compilation . The jar should be created in the 'dist' folder.

ADD REPLY
0
Entering edit mode

Eventually, I get the samjs.jar on my Mac. I'm going to move it to the server so that I can use it there.

Thanks so much for your information!

ADD REPLY
0
Entering edit mode

The IT people in my school is unwilling to install ant on the server. Do you know other ways to compile and install your tool samjs?

By the way, is there other way to extract reads with low mapping scores in the Bam file? I don't want to convert the Bam file to Sam file first and then convert it back. I want to deal with the Bam files directly. Do you think that Bamtools would work?

ADD REPLY
0
Entering edit mode
The IT people in my school is unwilling to install ant on the server: Do you know other ways to compile and install your tool samjs: compile it at elsewhere. But you have to know java before moving the files to your school (and that's not my job, sorry).
ADD REPLY

Login before adding your answer.

Traffic: 1550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6