Entering edit mode
4 months ago
njornet
▴
20
My sample is a mixture of human DNA from two individuals and I have pure DNA from one of them (let's call it 1). I want to extract the sequence of the other (let's call it 2), so I suppose I have to use the alignment I get from the pure DNA (1) as reference to discard the reads from 1 and being left with only the reads from 2. This is from nanopore data and I use minimap2 for alignment.
Which software should I use and how?
What does this mean? You have sequence from that individual sample sequenced on its own?
While you could use this sample as a reference to pull out reads from a mixture, this is going to extract reads that are common to both sample 1 and 2. Unless you have really long reads, it may be difficult to even do that. There is no way to distinguish reads as belonging to one sample or other (if the sequence is shared) unless the samples were barcoded before making libraries.
BTW: You asked variations of this question over the past few weeks ( Phasing a mixture of two individuals' DNA with long reads and Software to separate reads from different individuals ).
Yes, I have the sequence of just one individual, but not from the other. And I get the mixed sample from the beginning so there is no way I can barcode them.
The previous questions were if I could do the process without needing the sequence of one of the individuals to avoid this step, but seeing that is not possible I wanted to try this.
The only thing I need is the sequence of the individual from which I cannot obtain non-contaminated DNA, so if the sequence is shared across some reads and all of them would be mapped to the reference of person 1, I would be getting the sequence of person 2 as well. How would I map the reads to this custom reference from individual 1? Do I have to use very constrained configuration? Can I do that with minimap2 or should I use another program?
I am not sure what this project is about but if it has any clinical significance then you need to be cautious. Sequence of two humans samples is going to be largely similar when you look at it on kilobase scale with small changes where the SNPs are. Your quest of "separating" individual sequences (if this works, it will at best work partially) is going to be a difficult (dare I say impossible) task when you are dealing with an unlabeled mixture of genomes.