I have a list of short sequences that I want to obtain its coordinate or in another word to get its bed file after compare with a fasta file which contains original sequences.
Fasta file:
>PGH2
CGTAGCGGCTGAGTGCGCGGATAGCGCGTA
Short sequence fasta file:
>PGH2
CGGCTGAGT
Is there any ways to obtain its coordinate? Bedtools can't help much.
Desired output:
PGH2 6 14
@allyson1115ar: I have a feeling you are somewhat confused or you are confusing me with your question. The file format that you have shown above is a FASTA format. FASTA format hold sequence information, where string after
>
explain what sort of sequence that is e.g it can be a gene sequence, a sequenced read sequence, contig sequence. The gene (or any other features such as CDS, UTR's etc) coordinates, that previously have been annotated usually kept in gff/gtf or bed files.It is a good practice to give a bit more background information on forums to what you are doing. At least provide species name and what is PGH2, a gene or contig or something else?
If those are the reads you got from sequencing and you are interested in identifying whereabouts those read fall on the reference genome than your first step will be read alignment to the reference genome. This will give a bam file with start and end coordinates of where about your reads land in the reference genome.