I was trying to scan with positions from one file through positions in second file to find, if the features are overlaping between them. file a looks like: (typical vcf entries. Many of them)
chr1 1161692 chr1uGROUPERuDELu0u832 TGCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGGAAACATGGCACCTCCCCTCTGGG T 63
chr1 249174066 chr1uGROUPERuDELu0u832 TGCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGGAAACATGGCACCTCCCCTCTGGG T 63
chr1 249175897 chr1uGROUPERuDELu0u832 TGCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGGAAACATGGCACCTCCCCTCTGGG T 63
I have a file Pt looking like this(tab delimited):
chr1 249174065 249174067
chr1 249175897 249175899
I wrote:
for line in a:
line = line.strip().split()
for row in masterlist:
row = row.strip().split()
w=[]
if (line[0] == row[0]):
f = range(int(row[1]),int(row[2]))
w.append(line[1])
for i in w:
i = int(i)
if i in f:
print line
else:
break
else:
break
There is a problem now.
These both entries in Pt file should be a match. But the script only reports the first ontry from Pt file. If the first entry is not matched, the output is none. I want the script to output all matches
Do you just want to extract subsets of the vcf file that are within certain regions? You could just use vcf-query
yes I do. but number of these regions is quite high. Can i pass the file with these regions to the vcf-query ?