Entering edit mode
9 months ago
ManuelDB
▴
110
I am using FastaAlternateReferenceMaker to generate fasta reads with the indels I have in a VCF file. I am also using a bed file becaouse I need to create fasta reads from the specific start and end positions.
In general, I get what I need but the problem appears when I have regions in my bed file that overlap
chr19 33301307 33301599
chr19 33301556 33301805
chr19 33301647 33301946
chr19 33301898 33302195
chr19 33302152 33302440
which returns this
>39 chr19:33301308-33302440
TGGCCCAGGGCGGTCCCACAGCCGCGCGCCTCACGCGCAGTTGCCCATGGCCTGACCAAG
GAGCTCTCTGGCAGCTGGCGGAAGATGCCCCGCAGCGTGTCCAGTTCGCGGCTCAGCTGT
TCCACCCGCTTGCGCAGGCGGTCATTGTCACTGGTCAGCTCCAGCACCTTCTGCTGCGTC
TCCACGTTGCGCTGCTTGGCCTTGTCGCGGCTCTTGCGCACCGCGATGTTGTTGCGCTCG
CGCCGCACCCGGTACTCGTTGCTGTTCTTAGTCCACCGACTTCTTGGCCTTGCCCGCGCC
GCTGCCGCCACTCGCGCGGAGGTCGGGGTGCGCGGCGCCCAGCCCCTTGAGCGCGCTGCC
AGGGCCCGGCAGGCCGGCGGCACCGAGCGCGGGCGCGGGGTGCGGGCTGGGCACGGGCGT
GGGCGGCGGCGTGGGGTGACCGGGCTGCAGGTGCATGGTGGTCTGGCCGCAGTGCGCGAT
CTGGAACTGCAGGTGCGGGGCGGCCAGGTGCGCGGGCGGCGGGTGCGGGTGCGGGTGCGG
GTGCGAGGGCGGCGGCGGCGGCGGCGGCTGGTAAGGGAAGAGGCCGGCCAGCGCCAGCTG
CTTGGCTTCATCCTCCTCGCGGGGCTCCTGCTTGATCACCAGCGGCCGCAGCGCCGGCGC
CCCGACGCGCTCGTACAGGGGCTCCAGCCTGCCGTCCAGGTAGCCGGCGGCCGCGCAGCC
GTAGCCGGGCGGGGGCCCGTGCGCTCCCCCGGGCATGACGGCGCCGCCGGGGCCCGCGGG
CGCGCCCCGGGTAGTCAAAGTCGCCGCCGCCGCCGCCGCCCGTGGGGCCCACGGCCGCCT
TGGCCTTCTCCTGCTGCCGGCTGTGCTGGAACAGGTCGGCCAGGAACTCGTCGTTGAAGG
CGGCCGGGTCGATGTAGGCGCTGATGTCGATGGACGTCTCGTGCTCGCAGATGCCGCCCA
GCGGCTCCGGGGCGGCAGGTGGGGCGGGAGGCTGCGCGGGGCCCGCGCCCCGGGGAAAGC
CGAAGGCGGCGCTGCTGGGCGCGTGCGGGGGCTCTGCAGGTGGCTGCTCATCGGGGGCCG
CGGCTCCGCCTCGTAGAAGTCGGCCGACTCCATGGGGGAGTTAGAGTTCTCCCGGCATG
The program merges by default regions that overlap. I cannot find the way to avoid this and get one fasta read for each region of my bed.
I have been playing around with the parameters
--interval-merging-rule,-imr <IntervalMergingRule>
Interval merging rule for abutting intervals Default value: ALL. Possible values: {ALL,
OVERLAPPING_ONLY}
and
--interval-set-rule,-isr <IntervalSetRule>
Set merging approach to use for combining interval inputs Default value: UNION. Possible
values: {UNION, INTERSECTION}
This is my code
gatk FastaAlternateReferenceMaker \
-R Homo_sapiens_assembly38.fasta \
-O output_not_merging_interval.fasta \
-L amplicons_coo_v1.1_with_chr.bed \
--variant Unique_indels_in_ROI_chr_sorted_left_align.vcf \
If not possible ( I doubt that) I Can create a Python that loops taking the regions and run the program every time. and append the new read in a final .FASTA file.