Random extraction of restriction enzyme target sites from whole genome
0
0
Entering edit mode
8.8 years ago
cfarmeri ▴ 210

Hi Biostars

My original fastq reads have restriction enzyme target sequence at the head of each. (e.g. GGCCTTATATACATCGATCAAGATA......; GGCC is the restriction enzyme target site) and they have different length each other.

I would like to extract random reads from whole genome fasta. Each random read has the target site at the head and read length corresponding to original fastq reads.

(e.g. original: GGCCTTATATACATCGATCAAGATA, random: GGCCTTAAHCTAGATCGATCGCGT)

Now, I realize this random extraction by bedtools.

bedtools shuffle -i BED -g GENOME.FA > a.bed
bedtools getfasta -fi GENOME.FA -bed -fo b.fasta

Finaly, extract fasta reads with restriction enzyme target site from b.fasta by grep.

I mapped original fastq reads by Bowtie2. The BED file is this mapped bed by Bowtie2.

So this BED has sequence and length information corresponding to original fatstq.

But this processes are not necessary to get reads meeting my needs.

Anyone has efficient idea fitting my needs.

My English is not good but hope it doesn't cause any trouble.

Thanks

genome • 2.0k views
ADD COMMENT
0
Entering edit mode

Can you clarify what is there inside BED ? What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?

ADD REPLY
0
Entering edit mode

Thank you Goutham Atla. Sorry for not explaining enough.

I mapped original fastq reads to whole gemone by Bowtie2. The BED you mentioned is this mapped bed file.

What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?

Yes, after running getfasta, I extract them by grep.

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6