Select human protein coding transcripts in Diamond
1
0
Entering edit mode
3.1 years ago
bart ▴ 50

Hi,

I'm trying to select short DNA reads that align to human protein coding transcripts in the diamond tool. My problem is that Diamond normally does not select human reads. So I want to build with the diamond makedb tool. However, I'm not sure what FASTA file I would need in the --in <file> option: it needs a protein reference database, so would this be the NCBI nr database?

diamond blastx • 761 views
ADD COMMENT
2
Entering edit mode
3.1 years ago
GenoMax 147k

You should consider getting MANE select (LINK) proteins. See the project description and then download the faa protein sequence file from NCBI FTP site. This will contain one entry per gene.

Second option would be to download the curated human proteome files from UniProt (LINK). This set will be redundant and will contain isoforms etc.

so would this be the NCBI nr database?

That can be a third option. You could get the nr indexes (latest DIAMOND can now use blast indexes) and do the search. It may also support filtering based on taxID (which would be 9606 for human). BBuchfink was considering that request from someone.

ADD COMMENT

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6