I'm very early learning to code in R. Is there anyone who can help with a hint on how to do the following task? I don't know how to convert a .faa file into a table, how to select columns, merging tables into one table.
1- Download the following table from the following link https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/51/1820449%7CHomo%20sapiens/
2- Select the locus and protein products
3-Download protein sequences in FASTA format (only protein please) https://www.ncbi.nlm.nih.gov/genome/?term=Homo+sapiens
Note: this is a text-like file (80 MB) and you need to convert it into a table, thus you can merge it with the other file.
4- Use these two files to generate a table containing locus, protein products, and protein sequences
Protein products will be shared in both files. You can merge them.
What have you tried and what is your code so far ?
I have started like that: proteins <-read.csv("https://www.ncbi.nlm.nih.gov/genomes/solr2txt.cgi?q=%5Bdisplay()%5D.from(GenomeProteins).usingschema(%2Fschema%2FGenomeAssemblies).matching(genome_id%3D%3D%5B%2251%22%5D%20and%20genome_assembly_id%3D%3D%5B%221820449%22%5D)&fields=replicon_name%7CName%2Creplicon_accession%7CAccession%2Cstart%7CStart%2Cstop%7CStop%2Cstrand%7CStrand%2Cgene_id%7CGeneID%2Clocus%7CLocus%2Clocus_tag%7CLocus%20tag%2Caccession%7CProtein%20product%2Clength%7CLength%2Cname%7CProtein%20Name&filename=proteins_51_1820449.csv&nolimit=on")
table(proteins$Locus) table(proteins$Protein.product)
Shouldn't this have been covered in class in your course, or where did you get the assignment from?
There is a
merge
function in R.I have not taken an exact course for this. I just try to develop myself.