Parsing A Vcf File On An Ftp Server Using Single-Line Perl
2
Sanger recently released mouse SNPs (in VCF format) from next-generation sequencing for 18 strains. I need to get and parse (without downloading the huge file) all SNPs where the alternative alleles are homozygous for strain 129S1 for gene Impact with single line perl.
The site: ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v3.snps.rsIDdbSNPv137.vcf.gz
So far, I know I need to do this, but after that I'm fairly lost:
curl ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v3.snps.rsIDdbSNPv137.vcf.gz | gunzip - | perl .. .
and yes, this needs to be in single-line perl, not using VCFtools
perl
vcf
• 3.3k views
using awk:
curl -s "ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v3.snps.rsIDdbSNPv137.vcf.gz" | gunzip -c | \
awk -F ' ' '($0 ~ /^#/ || $11 ~ /1\/1\:/)'
now, if I use a2p to convert it to perl:
eval 'exec /usr/bin/perl -S $0 ${1+"$@"} '
if $running_under_some_shell ;
eval '$' . $1 . '$2 ;' while $ARGV [ 0] = ~ /^( [ A-Za-z_0-9] += ) ( .*) / && shift ;
while ( < > ) {
chomp;
@Fld = split( ' ' , $_ , -1) ;
print $_ if ( $_ = ~ /^
}
lftp -c 'open -e "zcat mgp.v3.snps.rsIDdbSNPv137.vcf.gz” ftp-mouse.sanger.ac.uk/current_snps/' | perl -ne '{chomp; if (/^#CHROM/) {print "$_ \n"; }else {@a = split (/\t/,
$_ ); print "$_ \n" if ($a [0] ==18 and $a [1] >= 12972252 and $a [1] <= 12992948 and $a [10] =~ /^1\/1|^1\|1/); } }' > mm10.129S1.Impact.altHom &
Login before adding your answer.
Traffic: 3233 users visited in the last hour
How is that not downloading the file?