Retrieving All Abstracts And Searching For Gene Names
3
0
Entering edit mode
13.4 years ago
Random ▴ 160

I'm currently looking for sex-related genes and their pseudogenes in a reptilian that I sequenced, for which there is no reference.

I would like to look for specific gene names on my BLAST results which may subscribe to the description I made above.

My idea was to go through the literature which contains key words, such as "sex" "bird" "X" "y" "zw/zz" "reptiles", and extract words that are all in capitals, because that those might well be gene names, and then just get rid of common abbreviations.

I just saw http://biostar.stackexchange.com/questions/2205/web-tool-that-converts-a-pubmed-query-into-a-wordle-of-the-abstracts where Lars provides a good solution.

Does anyone have any other suggestions(which may not even involve abstract retrieval), probably more effective for what I'm trying to do, or that might be the best one?

Thanks in advance

text • 3.0k views
ADD COMMENT
0
Entering edit mode

What's the content of your BLAST ? do you have the name of a gene in your hits ? What do you mean by "there is no reference" : your organism has not been sequenced, these are unknown gene or you don't have any bibliographic reference ?

ADD REPLY
0
Entering edit mode

The BLAST hits have the match_description, for example "Salmo salar clone BAC 261D01 Foxl2-like protein (Foxl2) gene"

FOXL2 is involved in ovarian development so it may be of interest.

The reason why I have to do this is because the sequences were largely contamined with bacteria, therefore there's a need to identify specific sequences( in this case contigs because they have been assembled), that might be related to my lizard.

When i mean there's no reference, is that there's no reference genome for my species.

ADD REPLY
0
Entering edit mode

Since I'm working with a W sexual chromosome, the most similar that may exist is Chicken's W. Lizards have anolis sequenced, but that lizard has XX/XY system, while mine has ZZ/ZW.

My BLAST have the following columns: query_id match_description %_identity alignment_length mismatches gap_openings q_start q_end s_start s_end evalue
bit_score"

ADD REPLY
0
Entering edit mode

I have no idea of what's on the W chromosome, so any tandem repeats that I might find, or genes linked to sex may well be of interest.

ADD REPLY
4
Entering edit mode
13.4 years ago
Pasta ★ 1.3k

You could make it simple.

1- Concatenate all your keywords using NCBI query syntax, for example : "sex" OR "ovarian" OR "bird" OR "Gene1" OR "grandad" ..... Whatever.

2- Copy paste this query in the NCBI query box (Is there a limit on number of characters you can use ? I dont know). Submit

3- Choose "Send to" , then "file"

4- Then you can parse/work on the downloaded file with some script.

You might give it a try, it should work fine if your query is not too long.

You can also try with SRS - that's badass for big queries

ADD COMMENT
2
Entering edit mode
13.4 years ago

There are a few tools that take sequences at input and generate links to relevant pubmed articles:

Maybe one of these tools could help you achieve what you are attempting to do.

ADD COMMENT
1
Entering edit mode
13.4 years ago

Here's a side answer, that might help you drastically reduce your BLAST hits: as I read that you have a large bacterial contamination, I would BLAST with exclusion of all bacterial sequences. If you are using online's NCBI BLAST, you can exclude a complete taxonomic ID (Bacteria=2). If you want to reduce your BLAST hits, you can also consider BLASTing only a specific TaxID (this can be something as large as Sauropsida, for instance).

ADD COMMENT

Login before adding your answer.

Traffic: 2027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6