Entering edit mode
4.1 years ago
selplat21
▴
20
I'm trying to write a loop in awk using the following info from two files:
- a file with chromosome in the first column and site in the second column
- a second file with chromosome in the first column and chromosome size in the second column
The sites in the second column range from the first to the last site of that chromosome, but the next chromosome will have sites starting from 1 again. I need to make all the sites in the first file contiguous so I will need to add the chromosome size to each site for chromosomes greater than 1 to make the sites contiguous in the first file.
Any help is appreciated!
Pleas provide representative input and desired output.
For example, a section of file 1 looks like this (chromosome, site):
The second file looks like this (chromosome, chromosome size):
For every chromosome bigger than 1 in the first file, I need to add the chromosome size of the preceding chromosome to make it continuous. So, the section of file 1 should look like this:
Your question seems unclear about the exact operation being performed, and your output looks suspect (duplicate rows in the output, but the first file contains different "sites").
Can you please simplify the question and double-check what the input and output should look like?
I apologize, one of the sites was accidentally duplicated there. I edited it.
File 1, Column 1 = Chromosome
File 1, Column 2 = Site
Example:
File 2, Column 1 = Chromosome
File 2, Column 2 = Chromosome Size
Example:
Desired Output:
File 1 has a list of sites for each chromosome ranging from 1 to the total chromosome size of that chromosome. Note that some sites are not present because these are filtered sites. However, the maximum site value is the chromosome size and minimum value is 1 for each chromosome. The desired output file makes this first file contiguous between chromosomes so that chromosome 2 would start where chromosome 1 ends. In order to do this, I would add the total chromosome size of Chr1 to all site values of Chr2 in the first file and so on for each subsequent chromosome.
I am simply just trying to add the value of file 2, column 2 to each site of file 1 column 2, but the value being added to file 1 is from the preceding chromosome. That being said, Chr1 would be ignored. Any additional chromosomes would have to add the chromosome size of all preceding chromosomes to the sites value.