I'm currently looking for sex-related genes and their pseudogenes in a reptilian that I sequenced, for which there is no reference.
I would like to look for specific gene names on my BLAST results which may subscribe to the description I made above.
My idea was to go through the literature which contains key words, such as "sex" "bird" "X" "y" "zw/zz" "reptiles", and extract words that are all in capitals, because that those might well be gene names, and then just get rid of common abbreviations.
Does anyone have any other suggestions(which may not even involve abstract retrieval), probably more effective for what I'm trying to do, or that might be the best one?
What's the content of your BLAST ? do you have the name of a gene in your hits ? What do you mean by "there is no reference" : your organism has not been sequenced, these are unknown gene or you don't have any bibliographic reference ?
The BLAST hits have the match_description, for example
"Salmo salar clone BAC 261D01 Foxl2-like protein (Foxl2) gene"
FOXL2 is involved in ovarian development so it may be of interest.
The reason why I have to do this is because the sequences were largely contamined with bacteria, therefore there's a need to identify specific sequences( in this case contigs because they have been assembled), that might be related to my lizard.
When i mean there's no reference, is that there's no reference genome for my species.
Since I'm working with a W sexual chromosome, the most similar that may exist is Chicken's W.
Lizards have anolis sequenced, but that lizard has XX/XY system, while mine has ZZ/ZW.
My BLAST have the following columns:
query_id match_description %_identity alignment_length mismatches gap_openings q_start q_end s_start s_end evalue
bit_score"
Here's a side answer, that might help you drastically reduce your BLAST hits: as I read that you have a large bacterial contamination, I would BLAST with exclusion of all bacterial sequences. If you are using online's NCBI BLAST, you can exclude a complete taxonomic ID (Bacteria=2). If you want to reduce your BLAST hits, you can also consider BLASTing only a specific TaxID (this can be something as large as Sauropsida, for instance).
What's the content of your BLAST ? do you have the name of a gene in your hits ? What do you mean by "there is no reference" : your organism has not been sequenced, these are unknown gene or you don't have any bibliographic reference ?
The BLAST hits have the match_description, for example "Salmo salar clone BAC 261D01 Foxl2-like protein (Foxl2) gene"
FOXL2 is involved in ovarian development so it may be of interest.
The reason why I have to do this is because the sequences were largely contamined with bacteria, therefore there's a need to identify specific sequences( in this case contigs because they have been assembled), that might be related to my lizard.
When i mean there's no reference, is that there's no reference genome for my species.
Since I'm working with a W sexual chromosome, the most similar that may exist is Chicken's W. Lizards have anolis sequenced, but that lizard has XX/XY system, while mine has ZZ/ZW.
My BLAST have the following columns: query_id match_description %_identity alignment_length mismatches gap_openings q_start q_end s_start s_end evalue
bit_score"
I have no idea of what's on the W chromosome, so any tandem repeats that I might find, or genes linked to sex may well be of interest.