Entering edit mode
5.5 years ago
Shicheng Guo
★
9.6k
Here, I open a tutorial to show the my usage experience of ALoFT to identify loss-of-function somatic or germline variants.
1) Install ALoFT: same as all other python software very hard to install.
2) Download ALoFT from ALoFT website
3) Python 2.7.5 (default, Oct 30 2018, 23:45:53) will be a little easy to install it quickly.
cd /home/local/MFLDCLIN/guosa/hpc/tools
wget http://org.gersteinlab.aloft.s3-website-us-east-1.amazonaws.com/aloft-1.0.zip
unzip aloft-1.0.zip
wget http://org.gersteinlab.aloft.s3-website-us-east-1.amazonaws.com/data.zip
unzip data.zip
PATH=/home/local/MFLDCLIN/guosa/hpc/tools/aloft/aloft-annotate:$PATH
source ~/.bashrc
With the above command, you install ALoFT and download annotation database and add it to .bashrc. Now let's start to use it.
aloft --vcf All_samples_Exome_QC.DG.vcf --output All_samples_Exome_QC.DG --data /home/local/MFLDCLIN/guosa/hpc/tools/aloft/aloft-annotate/data.txt
Feedback:
Question: How long time it is required for aloft to annotate 2,650,576 variants?
Answer: 20min-60min depends on the performance of your PC
Here is for LOF extract from pre-calculated hg19 annotation.
file=list.files(pattern="*.predict")
output<-c()
for(i in 1:22){
input=paste("chr",i,".vcf.vat.aloft.lof.predict",sep="")
data<-read.table(input,head=F,sep="\t")
newdata<-subset(data,data[,13]<0.05 & data[,15] !="Tolerant")[,c(1,2,9,15)]
output<-rbind(output,newdata)
}
colnames(output)<-c("CHR","POS","GENE","MODEL")
write.table(output,file="aloft.hg19.txt",col.names=T,row.names=F,quote=F,sep="\t")
perl -p -i -e 's/chr//i' aloft.hg19.txt
perl -p -i -e 's/Dominant/Dom/i' aloft.hg19.txt
perl -p -i -e 's/Recessive/Rec/i' aloft.hg19.txt
gzip aloft.hg19.txt
The potential problem you might meet and how to solve it:
- Majority problem will be caused by the Python version and python code setting. Try different Python version.
- If you meet tab and space problem, try:
autopep8 -i aloft