fasta index file too big for genome browsers
1
0
Entering edit mode
3.2 years ago
setschmann ▴ 10

Hi,

I have a very fragmented reference genome https://www.g3journal.org/content/9/7/2039 samtool generate a huge fai index file of 1.35gb, which is way too big to load into Jbrowse2.

How can i "simplify" the fasta or the fai file? what tools could i use?

index fasta fai jbrowse2 • 1.6k views
ADD COMMENT
1
Entering edit mode

You could try scaffolding if it has a close relative with a more contiguous assembly. Would help reduce the number of contigs

ADD REPLY
0
Entering edit mode

The index file is 1.35G? That seems really odd. What was the exact command you used to generate the index?

ADD REPLY
0
Entering edit mode
samtools faidx name.fa

the problem is the genome is very fragemented:

those are the quast results:

########
QUAST Results
########

All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).

Assembly                    Abal.1_1   
# contigs (>= 0 bp)         37192295   
# contigs (>= 1000 bp)      1276678    
# contigs (>= 5000 bp)      529013     
# contigs (>= 10000 bp)     343016     
# contigs (>= 25000 bp)     145508     
# contigs (>= 50000 bp)     46234      
Total length (>= 0 bp)      18167382048
Total length (>= 1000 bp)   13017811908
Total length (>= 5000 bp)   11361640463
Total length (>= 10000 bp)  10034318481
Total length (>= 25000 bp)  6872368770 
Total length (>= 50000 bp)  3406852776 
# contigs                   1887964    
Largest contig              297427     
Total length                13450974050
GC (%)                      38.76      
N50                         25814      
N75                         9780       
L50                         139726     
L75                         348468     
# N's per 100 kbp           1703.76  
ADD REPLY
0
Entering edit mode

From Istvan's reply and yours, it looks like you're doing things right. Like he says, a graphical tool might not work best for your requirement. See if you can either work with a different tool or tweak your requirement.

ADD REPLY
0
Entering edit mode

il try to remove contigs with less than 1000bp and see how it goes

ADD REPLY
0
Entering edit mode

That's still a million contigs. Genome Browsers are built to deal with contigs that number in the 10s (<50) typically. I hope this subset works out for you.

ADD REPLY
0
Entering edit mode
3.2 years ago

One might say that there is no point in loading a genome with 37 million contigs into a browser, many of the expected GUI widgets would be inoperable,

for example, the graphical dropdown widget to select chromosomes would now have 37 million entries ...

instead, select a few dozen of interesting contigs where it would make sense to look at the genome and visualize those.

ADD COMMENT
0
Entering edit mode

okay, but i would need to import it as a reference genome, a few contigs wouldnt be enough...

ADD REPLY
0
Entering edit mode

make yourself a smaller reference genome, just the contigs you are interested in

ADD REPLY

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6