Hello all,
I am currently attempting to visualise a genome on IGV, which I can successfully load my sequencing reads to, but am unable to add the GTF track to the reference genome due to discrepancies with the chromosome IDs and coordinates.
The genome file contains chromosomes which are split into two parts per chromosome, while the gtf file uses coordinates of the entire chromosome. I used this genome file to map my sequencing reads using HISAT2.
What I do have is a bed file which contains information on how the two parts of the chromosome map to the whole chromosome.
Is there a way to reformat my GTF file such that the coordinates and chromosome names match those of the split chromosome genome file, perhaps using the BED file?
Alternatively, is there a way to successfully load my reads, which were mapped using the split chromosome genome, to the intact reference genome (which I have no issues adding the GTF track to in IGV, but my reads won't map to it).
Many thanks in advance
**Genome file:**
>chr1A_part1
ACCTCGACCTA
**GTF file:**
chr1A GenomeAnnotation exon 10092 17001 . - . transcript_id "Gene1.1"; gene_id "Gene1"; gene_name "Gene1";
**BED file**
chr1A_part1 0 100000000 chr1A 0 100000000
chr1A_part2 0 50000000 chr1A 100000000 150000000
Are you asking how to do this programmatically since there are many entries in GTF?
A "not so smart" way to do this would be to copy the part of the GTF that represents
part1
into a separate file. Change thechr1A
tochr1A_part1
(with a find and replace inNotepad++
should do it) and the replace the section back in your original GTF.Thanks for your response. Yes, there are many entries in the genome and GTF file. The problem with altering the names in this way is that
chr1a
could map to eitherchr1A_part1
orchr1A_part2
depending on its position, which would be a lengthy process to figure out, especially for each chromosome. This would also not update the gene coordinates to match the split genome. I think the BED file provides the information I need for conversion, but I don't know how to utilise itIf your GTF file is not sorted based on the coordinates then it should be easy to sort using AGAT toolkit. Then it should be a matter of figuring out where the boundaries of your part 1 and part 2 etc and edit the file.