Question

Random extraction of restriction enzyme target sites from whole genome

0

Entering edit mode

8.8 years ago

cfarmeri ▴ 210

Hi Biostars

My original fastq reads have restriction enzyme target sequence at the head of each. (e.g. GGCCTTATATACATCGATCAAGATA......; GGCC is the restriction enzyme target site) and they have different length each other.

I would like to extract random reads from whole genome fasta. Each random read has the target site at the head and read length corresponding to original fastq reads.

(e.g. original: GGCCTTATATACATCGATCAAGATA, random: GGCCTTAAHCTAGATCGATCGCGT)

Now, I realize this random extraction by bedtools.

bedtools shuffle -i BED -g GENOME.FA > a.bed
bedtools getfasta -fi GENOME.FA -bed -fo b.fasta

Finaly, extract fasta reads with restriction enzyme target site from b.fasta by grep.

I mapped original fastq reads by Bowtie2. The BED file is this mapped bed by Bowtie2.

So this BED has sequence and length information corresponding to original fatstq.

But this processes are not necessary to get reads meeting my needs.

Anyone has efficient idea fitting my needs.

My English is not good but hope it doesn't cause any trouble.

Thanks

genome • 2.0k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.8 years ago by cfarmeri ▴ 210

0

Entering edit mode

Can you clarify what is there inside BED ? What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?

ADD REPLY • link 8.8 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you Goutham Atla. Sorry for not explaining enough.

I mapped original fastq reads to whole gemone by Bowtie2. The BED you mentioned is this mapped bed file.

What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?

Yes, after running getfasta, I extract them by grep.

Thank you

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 8.8 years ago by cfarmeri ▴ 210