Split fasta file into chromosomes
0
0
Entering edit mode
8.8 years ago
natasha ▴ 110

Hi

I have 105 bacterial isolates which are assembled into contigs. I also have a really good, fully assembled, reference genome, consisting of 2 chromosomes.

I would like to align each of my isolates to the reference genome, determine which contigs/sequence belongs to which chromosome and therefore split every fasta sequence into 2 chromosome.

I have been told that mauve is good for this. However, I have 105 isolates and mauve doens't seem to be able to cope with this much data at once. I could align smaller groups to the reference genome at a time. However, is there another way/tool to do this?

Thanks

chromosome contigs fasta core genome • 4.2k views
ADD COMMENT
0
Entering edit mode

Do you only want to split the reference into two chromosome files so you can use mauve on them separately?

ADD REPLY
0
Entering edit mode

No, I know how to split my reference into two chromosome files... I need to split all of my 105 isolates into two chromosome files.

ADD REPLY
0
Entering edit mode

Based on what criteria? What format are the files currently in?

ADD REPLY
0
Entering edit mode

Fasta format.

I either want to split the contigs or just crudely split the fasta sequence, by aligning them to the reference genome.

ADD REPLY
0
Entering edit mode

Have you tried to use mauve with the two chromosomes independently? Assuming there is no significant homology between the two chromosomes that would allow you to locate contigs from each isolate, which you can then split using a program called faSomeRecords from Kent Utilities. That may be a lot of mauve runs but it can work.

Other option is to try lastz, which was designed for chromosome sized sequences to identify the contigs you need.

ADD REPLY
0
Entering edit mode

Sorry I'm new to this, but you know how to use the file that gives mauve from 5 to 3 to know which contigs belong to chromosome 1 and which to chromosome 2??

ADD REPLY
0
Entering edit mode

Maybe Satsuma is a good option for your task. Satsuma is a tool that reliably aligns large and complex DNA sequences providing maximum sensitivity, specificity and speed. I've used it to align contigs against well assembled related genomes and sort the contigs according to their hit location on the reference.

ADD REPLY

Login before adding your answer.

Traffic: 1110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6