calculating distances between DNA markers
0
0
Entering edit mode
5.8 years ago
milton.andy ▴ 20

I am seeking help with what possibly is a trivial problem but so far I have not been able to locate suitable resource on the web so any pointers would be very appreciated. I have two uneven but sorted numerical arrays representing chromosomal positions of two non-overlaping sets of markers (SNPs). They can be represented as: [A1, A2, ..An] and [B1, B2,..Bz]. I need to automate calculation of differences between each member of set A and each member of set B:

A1 - B1, A1 - B2 ... A1 - Bz
A2 - B1, A2 - B2 ... A2 - Bz
..........................................
An - B1, An - B2 ... An - Bz

Output needs to be a sorted list of absolute values.

Given large numbers of markers involved I cannot imagine doing it manually. As I am not familiar with programming, bash or python script would be best.

If this has been already answered in the forum, please post a link. Many thanks, Andy

SNP bash • 1.0k views
ADD COMMENT
0
Entering edit mode

Hello milton.andy ,

  • Could you please provide an example of how your input looks like exactly?

  • Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
    code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Hi finswimmer, Thank you very much for your replay. A small example of the input can be found here:

https://drive.google.com/file/d/17DR6ZUtXP0jpGxf35E6PqAEERpNz7wpi/view?usp=sharing

The desired output should be as in this textfile: https://drive.google.com/file/d/1R_6duEybnIKn_3P9AI4ixPqlrzYcdI4l/view?usp=sharing

Being a novice I am having trouble conforming to some of the requirements, such as using the appropriate formatting conventions (aluded to in your second commet). I apologize for this and will try to learn these things in the near future.

Thanks again for your help Andy

ADD REPLY
0
Entering edit mode

Hello milton.andy ,

thanks for providing the example. Unfortunately this is not clear to me. Does the setA contain 3 Variants and the setB 6? Or is this a format issue?

The distances you show in your example output seems to be the absolute value for difference, otherwise A1-B1 would be negative. Am I right?

fin swimmer

ADD REPLY
0
Entering edit mode

What does the input data actually look like?

ADD REPLY
0
Entering edit mode

An example of the input can be found here:

https://drive.google.com/file/d/17DR6ZUtXP0jpGxf35E6PqAEERpNz7wpi/view?usp=sharing

The desired output should be as in this textfile: https://drive.google.com/file/d/1R_6duEybnIKn_3P9AI4ixPqlrzYcdI4l/view?usp=sharing

Thanks jrj.healey

ADD REPLY

Login before adding your answer.

Traffic: 1585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6