Question

How to run dbNSFP_sort.pl to remap coordinates ?

0

Entering edit mode

5.2 years ago

Igor Islanov • 0

Greetings.

Last time i was asking for help about dbNSFP i've noticed my own mistake and fixed it, and later deleted the post. But as of right now new errors appear and i'm strugling to understand reasons for them. Two types of errors, and i dont see the correlation between them, because whenever i re-run the plugin it shows either of them, while remapping different parts. First:

Modification of non-creatable array value attempted, subscript -6 at /mnt/share/WES/scripts/dbNSFP_sort.pl line 64, <STDIN> line 8706838.

Second:

/usr/bin/NSFP: line 8:  4293 Broken pipe             cat dbNSFP${version}_variant.chr*
                  4294 Segmentation fault      (core dumped) | /mnt/share/WES/scripts/dbNSFP_sort.pl 7 8 > dbNSFP${version}_hg19.txt

Code to run plugin copied from Bioinformatics Analysis of Whole Exome Sequencing Data but for newer version of dbNSFP (4.0a) instead of 3.5a

version="4.0a"
cat dbNSFP${version}_variant.chr* \
| /mnt/share/WES/scripts/dbNSFP_sort.pl 7 8 \
> dbNSFP${version}_hg19.txt
bgzip dbNSFP${version}_hg19.txt \
&& tabix -s 1 -b 2 -e 2 dbNSFP${version}_hg19.txt.gz

Is there a possibility to run sort plugin for 4.0 version of dbNSFP or do i have to use 3.5 because of this errors ?

edit: i should probably mention that i put code into bash script, while the env in which all workflow was occuring is conda. Is that the reason that slips me and i just need to fix that ?

edit2: even after typing commands right in terminal and not script - still get errors.

dbNSFP plugin software error • 2.8k views

ADD COMMENT • link updated 5.2 years ago by sacha ★ 2.4k • written 5.2 years ago by Igor Islanov • 0

0

Entering edit mode

Hi Igor, can you help me on an issue of variant annotation with dbNSFP3.5a using SnpSift . I generated the dbNSFP_hg19.gz for hg19 coordinates using the following commands:

unzip dbNSFPv3.5a.zip
head -n1 dbNSFP3.5a_variant.chr1 > h
cat dbNSFP3.5a_variant.chr* | grep -v ^#chr | awk '$8 != "."' | sort -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP_hg19.gz
tabix -s 8 -b 9 -e 9 dbNSFP_hg19.gz

However, I only got 0.36% annotated with this database (using both dbSNP151 and dbSNP150), which is very lower compared to the annotation with the dbNSFP2.9.txt.gz database (15.96%)(using also both dbSNP151 and dbSNP150). Can you please tell me what's wrong ? Thanks

ADD REPLY • link 4.9 years ago by Myosotis1979 ▴ 30

0

Entering edit mode

Sorry for the late response. As for your question: i have absolutely no idea. I started working in this field only recently but i also noticed that after using dbNSFP annotation only small portion of variants have anything, whereas it should be much bigger ammount. I have no clue how to solve it, as i never used previous versions of dbnsfp.

ADD REPLY • link 4.8 years ago by Igor Islanov • 0

score 3 · Answer 1 · 2019-10-02

3

Entering edit mode

5.2 years ago

sacha ★ 2.4k

Hi,

I just translated dbNSFP4 to hg19 using my own script without dbNSFP_sort.pl .

# Copy header 
zcat dbNSFP4.0a_variant.chr1.gz|head -n1 > dbNSFP4_hg19.txt
for file in `ls dbNSFP4.0a_variant.chr*.gz`
do 
    echo "process $file `date`" 
    # switch column 8,9 with 1,2 
    # avoid line without hg19 coordiniate ( $8 != ".") 
    # Be sure hg19 coordinate belong to the same chromosom.( I notice some chr11 in chr17 file )
    zcat $file| grep -v "^#"| awk 'BEGIN{FS="\t";OFS="\t"}$8 != "." && $1==$8 {$1=$8; $2=$9; print $0}' > temp.txt
    # Sort by position . Hint: Use -T option to change /tmp/ directory to store huge data cache 
    sort -k2n temp.txt  >> dbNSFP4_hg19.txt
done

# Compress with bgzip with multithread
echo "compress `date`" 
bgzip -@ 4 dbNSFP4_hg19.txt

echo "Create index `date`" 
tabix -s 1 -b 2 -e 2 -c '#' -S 1 dbNSFP4_hg19.txt.gz

ADD COMMENT • link 5.2 years ago by sacha ★ 2.4k

0

Entering edit mode

Hello and sorry for the late commenting. I just now had the ability to witness how this script works (because somehow it take enormous ammount of time), And here are the first results

process dbNSFP4.0a_variant.chr1.gz Tue Oct  8 09:26:25 MSK 2019
process dbNSFP4.0a_variant.chr10.gz Wed Oct  9 11:28:08 MSK 2019
process dbNSFP4.0a_variant.chr11.gz Wed Oct  9 22:23:45 MSK 2019
process dbNSFP4.0a_variant.chr12.gz Thu Oct 10 10:11:44 MSK 2019
sort: read failed: temp.txt: Transport endpoint is not connected
process dbNSFP4.0a_variant.chr13.gz Thu Oct 10 13:56:22 MSK 2019

And its still ongoing (and i suspect it will for the next couple days). The sort error buggles me, how to fix it ?

EDIT14.10 Seems like its not working 100% as intended. New error

process dbNSFP4.0a_variant.chr2.gz Sun Oct 13 03:17:36 MSK 2019
process dbNSFP4.0a_variant.chr20.gz Sun Oct 13 19:55:34 MSK 2019
process dbNSFP4.0a_variant.chr21.gz Mon Oct 14 01:12:20 MSK 2019
process dbNSFP4.0a_variant.chr22.gz Mon Oct 14 03:23:50 MSK 2019
process dbNSFP4.0a_variant.chr3.gz Mon Oct 14 07:59:10 MSK 2019
sort: write failed: 'standard output': No such file or directory
sort: write error
process dbNSFP4.0a_variant.chr4.gz Mon Oct 14 16:48:00 MSK 2019

ADD REPLY • link 5.1 years ago by Igor Islanov • 0

score 1 · Answer 2 · 2019-09-27

1

Entering edit mode

5.2 years ago

kanika.151 ▴ 160

Hi,

I had the same problem. Apparently, when you try to run the script and merge them together into one file. It worked for me when I did it separately for each chromosome and then combining it into one text file. I hope this helps.

ADD COMMENT • link 5.2 years ago by kanika.151 ▴ 160

0

Entering edit mode

Im not quite a programmer, so im not sure how exactly to do this in multiple files. Could you point me out please on what line to rewrite ?

ADD REPLY • link 5.2 years ago by Igor Islanov • 0

score 1 · Answer 3 · 2019-10-02

Hello!

Sorry for the late reply. This is a very crude way to do it but I will be making a script that incorporates all this and does it in one go.

running it one by one for each chromosome

cat dbNSFP4.0a_variant.chr1 | perl dbNSFP_sort.pl 7 8 > dbNSFP4.0a_chr1_hg19.txt
#find '.' in this directory and concatenate the all the files with this name
find . -type f -name 'dbNSFP4.0a_chr*_hg19.txt' -exec cat {} + >> dbNSFP4.0a_variant_hg19.txt 
head -1 dbNSFP4.0a_variant_hg19.txt >header.txt   #save the header in another file
grep -v "#" dbNSFP4.0a_variant_hg19.txt >dbNSFP4.0a_variant_hg19.2.txt   #for removing multiple headers
rm -f dbNSFP4.0a_variant_hg19.txt #remove multiple files
cat header.txt dbNSFP4.0a_variant_hg19.2.txt >dbNSFP4.0a_variant_hg19.txt #add the header to the file
sort -k1,1 -V -s dbNSFP4.0a_variant_hg19.txt   #sort the file according to the chromosome
# Compress and index
bgzip dbNSFP4.0a_variant_hg19.txt
tabix -s 1 -b 2 -e 2 dbNSFP4.0a_variant_hg19.txt.gz