How to extract unmapped mate for specific reads using pysam or other module in python?
2
0
Entering edit mode
2.1 years ago
Xi ▴ 10

Hi,

I am using pysam in python to extract mate reads of a list of target alignments, but get error "ValueError: mate MG01HX01:920:H3YVTCCX2:8:1223:25530:16850: is unmapped". I am just wondering whether pysam can get mate reads for both mapped and unmapped reads? Below is my code. I really appreaciated it if you can give adice for this issue. Thank you!

enter image description here

pysam • 3.1k views
ADD COMMENT
0
Entering edit mode

please advice.

ADD REPLY
0
Entering edit mode

I think you should check your data to ensure that you do have the mate in the BAM file.

And only then troubleshoot the code. Make sure that MG01HX01:920:H3YVTCCX2:8:1223:25530:16850 can be found twice in the data

samtools view input.bam | grep 'MG01HX01:920:H3YVTCCX2:8:1223:25530:16850'

now check the flags etc.

ADD REPLY
1
Entering edit mode
2.1 years ago

No, pysam cannot get the mates when the mate is unmapped read.

The reason for this is that in order to fetch the mate, it must know where the mate is in the BAM file. It does this using the index file. The a bam index tells the parser the position of the reads mapping to each part of each chromosome start in the BAM file as a number of bytes from the start of the file.

As an unmapped read doesn't have a position, the parser cannot find offset to look for it at. In theory the BAM specification could specify that unmapped in a contig, and give its position in the index. Unfortunately it doesn't.

ADD COMMENT
0
Entering edit mode

interesting point, never thought about it that way,

what the OP could do is build a dictionary with unmapped reads (hopefully these are few and far between) store the unmapped alignments there and when the mate is unmapped select it from that storage.

ADD REPLY
0
Entering edit mode

Yep, doable as long as there arn't too many. Would require parsing the file twice.

ADD REPLY
0
Entering edit mode

Yes, thanks! Does pysam has a function to collect the unmapped reads from a bam file? Because my bam file is larger than 200GB, it needs much to to parse through it.

ADD REPLY
0
Entering edit mode

Thanks! Does pysam has a function to collect the unmapped reads from a bam file? Because my bam file is larger than 200GB, it needs much to to parse through it.

ADD REPLY
1
Entering edit mode

No, not directly.

There are two ways to do this.

  1. Open a fetch on the bamfile, and make sure you specify until_eof=True because otherwise fetch doesn't return unmapped reads. Then test every read for read.is_unmapped.
  2. Use samtools view to filter the reads on the unmapped bam into a seperate file first using the unmapped flag. I don't know how fast that will be either. I think samtools commands can be run directly by pysam.

Neither is particularly satisfactory.

Not also that even where you do have properly mapped pairs, AlignmentFile.mate is slow. It is specifically noted in the documentation that it is not suitable for high-throughput work.

ADD REPLY
0
Entering edit mode

Thank you very much!

ADD REPLY
0
Entering edit mode
5 months ago
weisburd • 0

You can get all unmapped read pairs efficiently by running AlignmentFile.fetch('*').

see: https://github.com/pysam-developers/pysam/issues/424#issuecomment-2192755497

AFAIK fetch('*') only returns reads where both mates are unmapped. When a read pair has 1 mate that's mapped and one that's unmapped, the unmapped mate will be located next to their mapped mate in the file, so should be returned by the regular interval query (ie. fetch('chr1:12345-54321')). In contrast, the unmapped read pairs are located at the end of the BAM/CRAM file, and so require a special query.

ADD COMMENT
0
Entering edit mode

Mates of unmapped reads will often be next to their mates as the come out of the aligner (for most aligners). However, that won't be the case I don't think once the BAM is sorted and indexed.

ADD REPLY

Login before adding your answer.

Traffic: 2321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6