Perhaps use command-line tools like faToTwoBit to build an indexed reference genome of interest, and then command-line BLAT and your 2bit and FASTA files to query that reference genome. Where it can find matches, BLAT will yield hits that include chromosome name and start/stop positions, which you can parse into input to feed into Circos. These tools are part of the Kent Tools source code package.
I am currently trying tools like bowtie and bwa to index the reference
genome, Candidatus Kuenenia stuttgartiensis; however, the output file .sam
is not in its supposed result. I suppose to see actual genome information
but I see mass code like NNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN...
I don't think BLAT and BLAST have Candidatus Kuenenia stuttgartiensis
(bacteria).
The web version of BLAT does not, very probably, but you can definitely build your own index files and query against them with the command-line tools, if you have the reference genome somewhere.
Yes, I have the reference genome. How could I build my own index and query
against it? Could you suggest me some tools and how to do it? What are the
command-line tools? It is part of Xcode on Mac?
You will need a compiler installed to build these tools. If you are using OS X, you will need to install Xcode and then install the command-line tools via that app. Then you can download the Kent Tools source code and compile it to get blat and faToTwoBit and other tools.
The output file myHits.psl is in PSL format. You can convert to BED with psl2bed and cut -f1-3 to grab the first three columns, or just read the specs for PSL and use cut or awk etc. to grab those columns.
I am working on an ANAMMOX project. I don't need to compare the Kuenenia stuttgartiensis genome against human genome sequence.
Have you had experiences in bwa and Samtools? I generated the *.sam file but it has so many gaps. I need to remove the gaps and find the start and end positions of each alignment.
Your first link does not work; can you please post a working link to your contig data?
Hey Alex,
The data works on my computer (just tried). Here is the link: http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AMCG01#contigs
click any FASTA link on the right to get the data.
Thanks a lot! I am a Computer Science person, new to Bioinformatics. Any advice would be appreciated!