Hi, I am new to bioinformatics so I apologise if this is an obvious problem that I have missed. I could not find a similar problem online.
I have over 400 vcf files from different Solanum species and I have used tabix to extract my region of interest out of those files. I made a script to run through all of my files. Here is an example of what it looks like:
FILES=~/Location/*.vcf.gz
for f in $FILES
do
echo "Processing $f file..."
tabix -fh $f ch01:1000000-5000000 > $f.my_gene.vcf
done
Now I have 400+ new vcf files but with only my gene region. In a number of the new output files I have noticed that they contain nothing more than just the header of the original file, meaning that there were no variants in that file for my gene and are therefore not of interest to me. Firstly, is there a way I can get tabix to not output a file if there are no variants in a region? Or alternatively, how can I run through my list of files and delete those that only have a header?
Thanks, Kyle
are you sure they share the same chromosome notation:
chr01 != chr1 != 1 != 01
?Yes, all files have the same notation