This question is related to g2gtools, link: https://github.com/churchill-lab/g2gtools I am not sure how much help I can receive, but would be glad to hear any feedbacks.
I am trying to create a chain file using a reference genome from my model organisms and population level indel.vcf file. NOTE: This tools is used to build alternate reference genome and transcriptome database.
I am using the script described in here: https://github.com/churchill-lab/g2gtools which matches the script for the example files in https://github.com/churchill-lab/sysgen2015/blob/master/markdown/RNASeq_pipeline.md And, I am able to successfully able to run the tools using the example data and got expected outputs.
I then tried it on my data using the reference genome (ordered and indexed), indels.vcf (ordered, indexed and appropriately formatted according to vcf specification). But, I am getting an error. I have tried to make sure the vcf is not corrupted and has all the requirements fullfilled. Infact this indel.vcf was created using the same refence genome used with the tool, so there shouldn't be any incompatibilities. Also, I compared my indel.vcf with the example.indel.vcf and they comply with the format. But still getting the error message in terminal. Some part of the error message is:
VCF FILE: /media/everestial007/Seagate-ExtHDD/DATA_analyses/ASE_analysis-using_g2gtools/passed_indelsMA622.sorted.vcf.gz
FASTA FILE: /media/everestial007/Seagate-ExtHDD/DATA_analyses/ASE_analysis-using_g2gtools/lyrata_sorted.fa
CHAIN FILE: /media/everestial007/Seagate-ExtHDD/DATA_analyses/ASE_analysis-using_g2gtools/sorted-ref-to-MA622.chain
STRAIN: MA622
PASS FILTER ON: False
QUALITY FILTER ON: False
DIPLOID: False
STRAIN SAMPLE INDEX: 0
Parsing VCF file...
Processing Chromosome 1...
Processing Chromosome scaffold_24...
Processing Chromosome scaffold_86...
Processing Chromosome scaffold_118...
Processing Chromosome scaffold_149...
Processing Chromosome scaffold_184...
Processing Chromosome scaffold_214...
Processing Chromosome scaffold_54...
Unable to parse record, improper VCF file?
Unable to parse record, improper VCF file?
Unable to parse record, improper VCF file?
Unable to parse record, improper VCF file?
Unable to parse record, improper VCF file?
Unable to parse record, improper VCF file?
Can someone suggest me what might be going wrong. I have taken every measures available to make sure the vcf file is good (and it was generated using the reference genome used in the g2gtools pipeline).
Thanks much in advance !