I'm fairly new to NGS and bioinformatics... I performed ChIP-seq where I inserted some nucleotides in the E.coli background. I had no problem adding the few nucleotides and create a fasta file for mapping (I download genbank file of the background genome from NCBI and added nucleotides in snapgene and exported as a fasta file)... Now I want to visualize in IGV and would love to add the annotation as well... I had a genbank format that already contained all the annotation information, is there a way to create an annotation file from it? I found I can see the annotation if I directed load the gb file into IGV but somehow in that case my output from MACS2 didn't show up on the track at all (empty on the track)... If I load in the fasta file only, I can visualize the peaks...but then I can't see the annotations. Are there any recommended ways to create a modified annotation file (I only have one long insert in the genome)?
Hi, you can use awk to extract certain columns of gff files (e.g. accession number, start, stop, strand,...), and add some rows based on your annotations. You need to check with IGV to see what columns of gff is required. If you provide some examples, I may be able to help better.
Thanks, Fatima. The thing is that since I insert a small piece of sequence into E.coli MG1655, I need to change all the coordinates accordingly, so simply extracting the rows won't work. Instead of manually doing that, I was wondering since I already have the Genbank format with full annotation, can I somehow output a gff file from there. I tired but was only able to output some tab file where my IGV having problems reading in.
You can try "Fraggenescan" to do the gene prediction in the modified sequence, and then combine the gff file from Fraggenescan with the gene bank annotation that you have.
Hi, try to annotate your genome (edited fasta file) with prokka. I know that it must more easy to just correct the coordinates in the genbank file with some text edit tool, but i really dont trust that the change will work well. Prokka is very easy to use and can annotate a genome in under 20 min in a normal desktop. If you have some curated data in your previous annotation you can feed yout genbank file to prokka to emprove the final result.
Thanks for the suggestion, I'll try that!