Entering edit mode
3.8 years ago
Kinoppy
•
0
I have a excel file with more than 10.000 rows from a meta-barcoding study on airborne fungi.
Each rows correspond to an OTU and I would like to ask if is possible to include a separate column with information on similarity (%), like those from a local alignment in NCBI. I can do this operation manually, taking the DNA sequence of each OTU and doing the local alignment on NCBI, but it will take me a lot of time...
There is any automatic way to perform this operation?
Thanks for the help and sorry if I forgot to provide more information.
run it programmatically using bash+loops.
Well, depending on the compute and storage resources you have, you could just run
blast
locally (i.e., on your own computer) against your database of interest (rRNAs, I presume?). I believe the NCBIblast
API is now deprecated, so unless you want to scrape you way though the results usingpython
orR
or whatever, you might have to end up getting an account on Google Cloud or elsewhere to get the job done. The rest of it would just be going from theExcel
file to aFASTA
file and working through the results of the search (e.g., just taking the best hit or whatever).I'm not sure which portions of what I've addressed are/would be problematic for you (presuming I understood your stated problem correctly), but in theory, all of this is addressable if you provide us with more information.
Thanks for your detailed answer, I will try to provide you more details. I state that I am still a master student (in forestry and environmental science) and that this type of work is new to me. So I apologize in advance and thank you for your patience and your invaluable help.
From September to December I was an Internship student in a research center which works with meta-barcoding of fungi. I've received from my supervisor a data-set containing reads of ITS region of fungal spore from airborne traps, collected in different years. So, I've spent a lot of time studying in order to be able to work with this stuff and finally I was able to perform the following operation: -global alignment using PlutoF; -filtering of non-fungi species; -assignment of functional group and throphic mode to each recognized species with FUNGuild; -graphical representation of community composition using R; -summary table (with percentage) of 30 most present species of interest (pathogen, micorrhyza, etc.); -and finally, a summary table with all the fungi with percentage subsetted for each years.
My issue is in this last table. Because I have a lot of OTUs and my supervisor ask me to add also this column with similarity level. Finding an automatic way to perform this operation will mean saving a lot of time for me.
Thanks again and if I need to provide you more info, just ask.
Edoardo
Thank you for your detailed response.
" -and finally, a summary table with all the fungi with percentage subsetted for each years. " By "percentage" do you mean percentage similarity (against a known fungal species; as established in your original post) or the incidence rate of that OTU (per year). I'm asking because those are two different things, and I'm not really able to distinguish which it is that you're after.
If assigning a taxonomic label to an OTU is what happens to be your problem, running a local
blast
as I suggested (or probablymmseqs2
instead, given it's lighter on resources and much faster) is what you need to do. If you're having problems with doing this, I guess we'd have to discuss over Skype or Zoom or whatever (I am happy to assist), as it's outside the scope of this forum to provide on-going technical support (I suppose).Thanks Dunios, I need to enter the percentage of similarity level between OTUs sequences and the assigned species from the blast alignment.
I will be very happy to receive direct assistance via video call. Let me know how we can manage it and thanks for the support!
Alright, that should be relatively quick work depending on how this plays out.
Hmm, do you use Discord or Skype? If you could share your ID with me I'll add you on one of those platforms.
Alternatively, you can join me on Skype using this link <snip> (I'll delete it once you join).
Let me know what works for you.
Skype is better for me. I can join you after 18:00 (UTC+2) today. Otherwise tomorrow when you prefer.
Edoardo
Sounds good to me. I am online now. I hope that Skype invite link works (I've never tried this before).
Ok, I will try to connect with the link now.