How do I remove files that only contain a header OR get tabix not to output a file when the input does not have SNPs in the specific region
1
0
Entering edit mode
7.0 years ago
kyle • 0

Hi, I am new to bioinformatics so I apologise if this is an obvious problem that I have missed. I could not find a similar problem online.

I have over 400 vcf files from different Solanum species and I have used tabix to extract my region of interest out of those files. I made a script to run through all of my files. Here is an example of what it looks like:

FILES=~/Location/*.vcf.gz
for f in $FILES
do
        echo "Processing $f file..."
        tabix -fh $f ch01:1000000-5000000 > $f.my_gene.vcf
done

Now I have 400+ new vcf files but with only my gene region. In a number of the new output files I have noticed that they contain nothing more than just the header of the original file, meaning that there were no variants in that file for my gene and are therefore not of interest to me. Firstly, is there a way I can get tabix to not output a file if there are no variants in a region? Or alternatively, how can I run through my list of files and delete those that only have a header?

Thanks, Kyle

SNP tabix • 1.8k views
ADD COMMENT
0
Entering edit mode

I have 400+ new vcf (...)meaning that there were no variants in that file

are you sure they share the same chromosome notation: chr01 != chr1 != 1 != 01 ?

ADD REPLY
0
Entering edit mode

Yes, all files have the same notation

ADD REPLY
1
Entering edit mode
7.0 years ago

use grep "for non-header-line" followed by a AND (&&) logical operator

   (...)
  tabix -fh $f ch01:1000000-5000000 | grep -m1 -v '^#' > /dev/null &&   tabix -fh $f ch01:1000000-5000000  > $f.my_gene.vcf
  (...)
ADD COMMENT
0
Entering edit mode

Thank you sir! Worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 1139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6