Finding specific k-mer in human genome
2
1
Entering edit mode
8.1 years ago
jye ▴ 10

I want to find a specific 9-mer (GATCGATGC) in human genome, and then export them into a bed file with all information including chromosome, start and end position. A lot of tools such as jellyfish and DSK can only count k mer occurrence and can't export k mer information. Does anybody know how to do this? Any suggestion would be greatly appreciated.

k-mer bed list all coordinates • 4.4k views
ADD COMMENT
1
Entering edit mode

Do you mean you just want to search the string "GATCGATGC" across the genome fasta and get the coordinates ?

ADD REPLY
0
Entering edit mode

This is probably the best thing to do, because if a read starts with "ATCGATGC" (no G at the beginning) then it is probably still relevant information to you. It is therefore probably best to find the genomic regions for GATCGATGC, then count the reads that fall anywhere over those regions, rather than the much more expensive computation of GATCGATGC in reads (with mismatches, etc)

ADD REPLY
0
Entering edit mode

Yes. That's what I want to do

ADD REPLY
0
Entering edit mode

Perhaps you can simply use (and edit) one of the AWK commands that I posted in a previous answer: A: Correct statistical test to determine the significance of nucleotides present

ADD REPLY
5
Entering edit mode
8.1 years ago
Asaf 10k

EMBOSS has the tool fuzznuc, you can execute it in Galaxy and then convert the output to the desired format. Fuzznuc has several output formats, such as table or gff, one of them should work for you.

ADD COMMENT
0
Entering edit mode

good to know. Have not come across this before.

ADD REPLY
0
Entering edit mode

That's a great tool. Solved my problem! Thank you!

ADD REPLY
1
Entering edit mode
8.1 years ago

UCSC BLAT is not ideal for short sequences, but a command-line version of BLAT could be used locally with a small tile size and options -minMatch and -minIdentity to export a PSL file, and from there, a conversion script like psl2bed can be used to get a BED file for downstream set operations.

ADD COMMENT
0
Entering edit mode

Thank you. Good to know.

ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6