Entering edit mode
6.5 years ago
Sam
▴
150
Dear Biostars
I have a fasta file with about 100 K DNA sequences with length about 300bp , and I want to align them to a target genome to have bam output file, could you introduce me an appropriate script for this alignment?
Thanks
Just to give the context> prepare a GFF file for MOCK fasta reference is related and describes a little bit more how the reference is created.
You should really provide us with the context by yourself. Please take a few minutes to edit and revise your post to contain all the necessary information. Please be very specific and rephrase phrases like: "a target genome (which exactly? seemingly a synthetic reference)", "some GBS data (what is that?)", " a tree sample (which species of tree)", "some pipeline"... You might think that knowing the exact species, sequences and methods applied is not relevant to solving the problem, but that is absolutely not the case!
I suggest that we hold off a little until this is fixed.
Hello Sam,
this is a very basic question. What have you tried so far? What problems are you facing?
fin swimmer
already I tried bowtie2 with
-X 400 -I 100 --very-sensitive
but about 35 % of sequence could not match and I think the issue is the length of the sequence and seed region in alignment, what do you think?I agree with finswimmer that this is a very basic question, and it seems you did not put a lot of effort into solving it yourself.
This information should have been in your initial question. Also, you should elaborate on how you obtain the data and which organism you are working on.
Also indicate why the data is in fasta format?
it's a mock reference (which is created by merging of some GBS data of a tree sample) due to that is in fasta format.
Wait, is the reference genome created by merging GBS data? Or this merged GBS is the data you are trying to map?
What is the reference genome? What is the plant species?
in GBS analysis in some pipeline is possible to merge the GBS data of some samples to prepare a reference for SNP variant analysis. so I have a GBS created ref and I want to align it to a reference genome of Populus. I think here the issue is the long sequence in mock ref file because I lost about 35 % of sequence during alignment. how can I set the criteria for these long sequence to retrieve as much as possible?
maybe useful to others! I obtained better results with BWA MEM algorithm with default flags, I think the issue was the algorithm of alignment.
I enjoyed your company