Hi Biostars
My original fastq reads have restriction enzyme target sequence at the head of each.
(e.g. GGCCTTATATACATCGATCAAGATA......
; GGCC is the restriction enzyme target site) and they have different length each other.
I would like to extract random reads from whole genome fasta. Each random read has the target site at the head and read length corresponding to original fastq reads.
(e.g. original: GGCCTTATATACATCGATCAAGATA
, random: GGCCTTAAHCTAGATCGATCGCGT
)
Now, I realize this random extraction by bedtools.
bedtools shuffle -i BED -g GENOME.FA > a.bed
bedtools getfasta -fi GENOME.FA -bed -fo b.fasta
Finaly, extract fasta reads with restriction enzyme target site from b.fasta
by grep
.
I mapped original fastq reads by Bowtie2. The BED file is this mapped bed by Bowtie2.
So this BED has sequence and length information corresponding to original fatstq.
But this processes are not necessary to get reads meeting my needs.
Anyone has efficient idea fitting my needs.
My English is not good but hope it doesn't cause any trouble.
Thanks
Can you clarify what is there inside BED ? What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?
Thank you Goutham Atla. Sorry for not explaining enough.
I mapped original fastq reads to whole gemone by Bowtie2. The BED you mentioned is this mapped bed file.
Yes, after running getfasta, I extract them by grep.
Thank you