hello, I have mouse exome data, for which I have downloaded the reference from ucsc. I am trying to create an interval file from Agilent SureSelect
This is the command I am using
java -jar picard.jar BedToIntervalList I=S0276129_Covered.bed O=S0276129_Covered.interval_list SD=mm10_genome.dict
This is the error I get
chr1 was past the end: 195471971 < 196469947
chr5 was past the end 151834684 < 151842168
chr7 was past the end: 145441459 < 145451439
chr8 was past the end: 129401213 < 129458847
chr12 was past the end: 120129022 < 120129244
chr14 was past the end: 124902244 < 125075837
chr16 was past the end: 98207768 < 98218510
chr17 was past the end: 94987271 < 95126542
chr18 was past the end: 90702639 < 90702728
Any idea on why the sequence length differ from the dictionary file and the interval file and how can I correct this ?
many thanks,
Thanks Pierre. The sequence dictionary has been created using the fasta file and picard's CreateSequenceDictionary function, is there a possibility of that going wrong ?
Also, the bad lines can be removed but will this affect the "target areas" for variant calling ?
no , so the problem would com from from your bed (it is mm10 ?)
We have not idea about the way you're going to use this interval file, which tool ?
you can always trim the bed.
Yes its mm10. This interval file is being prepared to be used for GATK variant calling.