Question

Bfast Match Argument : Regroup Multiple Reads File For Final Alignment

0

Entering edit mode

13.7 years ago

Francois Olivier Hébert ▴ 280

I would like to align reads from multiple samples on the same contigs from a de novo assembly. I am working on a non model species and I don't have any reference genome. Consequently, I have made a de novo assembly with reads from a sequence capture chip. The chip contains approximately 3000 genes and this is what I have assembled. I have a little bit more than 50000 contigs on which I would like to align Illumina paired-end reads (100 bp). Those reads are separated in 24 files containing approximately 30 million reads each. Each file represent an individual from one of 2 different populations and they are all individually tagged.

I know the genome of my species contains a lot of repeated sequences, so I decided to use BFAST to align the reads on the contigs. I am currently in step 3, which is the "match" argument (finding CALs). My problem is that at the end of the process of aligning the reads, I would like to have a file containing the contigs and the reads of ALL THE INDIVIDUALS aligned on these contigs.

My question is : how can I do this if I align the reads separately for each individual ? (I know BFAST works better with files that are not too big, i.e with a few million reads)

Will I have a separate alignment file for each individual ? Could I merge these files somewhere in the process ? Or at the end ?

The final goal would be to find SNPs and conduct different population genomics analyses with the results.

Can anybody help me with this ? I am trying to start working with these tools, but I am definitely not an expert in bio-info. :S

Thank you VERY MUCH !

next-gen sequencing alignment multiple • 3.3k views

ADD COMMENT • link updated 11.5 years ago by Biostar 20 • written 13.7 years ago by Francois Olivier Hébert ▴ 280

0

Entering edit mode

I launched the task using a simple loop in bash and now, it's been running for 2 days and 3 files out of 24 have been processed... but I don't have any output files yet in the folder. Is it supposed to be normal? I'm thinking that maybe when everything is done, the output files will be there... but it doesn't look good so far. :S

ADD REPLY • link 13.7 years ago by Francois Olivier Hébert ▴ 280

0

Entering edit mode

Alright, I solved the problem, I wasn't sure I had to do this, but I just re-directed the output in the loop into the correct output file. Everything's fine.

ADD REPLY • link 13.7 years ago by Francois Olivier Hébert ▴ 280

score 2 · Answer 1 · 2012-01-06

2

Entering edit mode

13.7 years ago

Manu Prestat 4.1k

I am not sure to understand, but if you only want to concatenate you results, you could just use the cat GNU/Linux command tool.

cat * > myConcatResults

or if your results have the same prefix (e.g. "myresults")

cat myResults* > myConcatResults

ADD COMMENT • link 13.7 years ago by Manu Prestat 4.1k

0

Entering edit mode

Yeah I thought of doing this but I just don't know what the files look like, so I wasn'nt sure it would work. I'll try this when it's finished.

ADD REPLY • link 13.7 years ago by Francois Olivier Hébert ▴ 280

0

Entering edit mode

Well, I'm not done, but it works fine. I wasn't sure it was possible to do this, but now I understand how the files work out and what it looks like. Thanks !

ADD REPLY • link 13.6 years ago by Francois Olivier Hébert ▴ 280