Entering edit mode
6.4 years ago
ThePresident
▴
180
I have paired-end fastq files, however my reference genome is in a list of contigs. Thus, the referece fasta file looks something like this
>contig1
AGTGCAGAC.....
>contig2
GCGATCACA......
>contig3
....
Is there a way to instruct bwa
to perform alignement on the 'contig1', then move on the 'contig2' and so on? Concatenating contrigs into one single fasta file is not an option as I'll have random pairing of contigs and paired-end reads might be mapped to different contigs producing all sorts of funny stuff.
I tried looking for this on previous posts but couldn't find anything similar.
TP
What do you mean by "random pairing of contigs"?
BWA works on multi-contig fasta files, in fact, most reference genomes are multi-contig fasta files. I don't understand what is the problem, or maybe I don't understand what you want to do.
What I meant by "random pairing of contigs" is that contigs probably won't be in the correct order (compared to the reference sequence), and some of them might be reversed as compared to the reference sequence i.e. the actual genome. When paired-end reads are aligned on these incorrectly joined contigs, they might produce discordant pairs (outward oriented reads and such). Hence, aligning independently on each contig instead on concatenated contigs.
I didn't know that
bwa
could work on multi-fasta files. Does indexing works the same way?When you concatenate several fasta sequences on a single multi-fasta file, all contigs remain separated from each other. If a pair of reads map on different contigs, it will result in a discordant pair regardless of the contig orientation. This is a by product of incomplete assemblies, and there is nothing you can do about it, except for getting a better assembly somehow.