check genes from a list. print if match
1
0
Entering edit mode
6.7 years ago
windsur ▴ 20

Hello! I haven't seen any similar question, so:

I have a list of genes and several vcf files. What I would like to do is to check from the list of the genes in all vcf files from a dir, and if I get a match, return me in one table (e.g excel) with all the info line, the first columm should havethe name of the match file.

At the momment what I get is a filter script for each file, but I don't know how to check in a dir tree and return it all in a single table.

import sys
from glob import glob
from subprocess import call
from pandas import DataFrame

> gene_list = open("./genes_rp.txt",'r')
> gene_list = gene_list.readlines()[1:]
> 
> final_list = list() for gene in gene_list:    
>     gene = gene.strip('\n').split('\t')   
>     final_list.append(gene[0].strip())
>  
> sample_folder = glob(sys.argv[1] + '*prefiltered.txt')
>  
> for sample_path in sample_folder[1:]:     
>     sample = open(sample_path, 'r')
>      sample = sample.readlines()
> 
>   header = sample[0].strip('\n').split('\t')  
>  output = list()
>   output.append(header)
> 
>   for variant in sample:      
>       variant = variant.strip('\n').split('\t')
>        variant_gene = variant[0]      
>       if variant_gene in final_list:
>         output.append(variant)
>  
>   df = DataFrame(output)
> 
>   df.to_excel(sample_path + '_rp.xlsx', sheet_name='sheet1', header = False,index=False)

The script above it will be usefull if you have a a vcf with a lot of genes and you wanna see only a few of them

python genes list vcf • 1.1k views
ADD COMMENT
1
Entering edit mode
6.7 years ago

use the standard linux tools. Something like:

find /path/to/dir/ -type -name "*.vcf" | while read F ; do grep  -H -w -o -f  genes.txt $F | uniq ; done

and please, don't use Excel. Excel is bad

ADD COMMENT

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6