i downloaded a assembled genome from NCBI in .fna format that Its content is like this (a sample part).
GAATGTCACGGCGAGTGAAACTACGTAAATTAAATATCTATTGTGAAGAACatgttctaagagtttttttcagagcattc tgcctcattttcgaatCTAAACTTAGGTAAGAGTTTGAAATAAGGGTAAATGTTTCTTGATGACCATATggcttgtatgg tggatgaaagttctttAAACCACATGctacaactcagtaatgaatgatTGTCGAATCCGAGATGCATGTAGCGTATTTGA AACATGGAACATCACAATGtgtgaaactatgtaaattacatatttcttgggtagaactcgctccaagagtaTTTTTCTGC
what is lowercase letter mean? how can i convert all nucleotide to uppercase in fasta format?
meantime, identifier of sequences is like this.
>NW_011509460.1 unplaced genomic scaffold, scaffold730, whole genome shotgun sequence
>NW_011509461.1 unplaced genomic Scaffold670, whole genome shotgun sequence
that i want to convert them to:
>NW_011509460.1
>NW_011509461.1
What's the link between your question and the title of this thread?
R has a toupper() function. Say you have your fasta file saved as an object called df; the you can use something like:
In the R cmd line type
for more info.
The toupper() R function has worked well for me. I was using assembly software that ouputted in lowercase and needed to convert this to uppercase for use in downstream quality-check software. The simple R script I used was: