Entering edit mode
4.2 years ago
yaghoub.amraei
▴
10
Hello to all I have a series of Small RNA-seq data that was actually the output of the mapping reads Small RNA-seq with mirbase database by Genomic CLC software. Now I need to use the mapping output to get the sequences conseve using the accession number of the stem-loop structures on the mirbase database. Someone can help me with the awk or grep command.
this is my fasta sequence conserve from mapping clcl genomic
>MIR156d//MIR156a//MIR156f_Cucumis_melo//Manihot_esculenta//Brachypodium_distachyon//Asparagus_officinalis
TGACAGAAGAGAGTGAGCAC
>MIR157a//MIR157b//MIR157c//MIR156b//MIR156c_Arabidopsis_thaliana//Solanum_lycopersicum
TTGACAGAAGATAGAGAGCAC
>MIR157d//MIR156i_Arabidopsis_thaliana//Fragaria_vesca
TGACAGAAGATAGAGAGCAC
>MIR159_Aquilegia_caerulea
TTTGGACTGAAGGGAGCTCTA
precursor from hairpin.fa download from mirbase
>cel-mir-2 MI0000004 Caenorhabditis elegans miR-2 stem-loop
UAAACAGUAUACAGAAAGCCAUCAAAGCGGUGGUUGAUGUGUUGCAAAUUAUGACUUUCA
UAUCACAGCCAGCUUUGAUGUGCUGCCUGUUGCACUGU
>cel-mir-34 MI0000005 Caenorhabditis elegans miR-34 stem-loop
CGGACAAUGCUCGAGAGGCAGUGUGGUUAGCUGGUUGCAUAUUUCCUUGACAACGGCUAC
CUUCACUGCCACCCCGAACAUGUCGUCCAUCUUUGAA
>cel-mir-35 MI0000006 Caenorhabditis elegans miR-35 stem-loop
UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUA
UCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCC
>cel-mir-36 MI0000007 Caenorhabditis elegans miR-36 stem-loop
CACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUA
UCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGA
You would need to convert the
U
toT
in the miRbase data (sed 's/U/T/g' hairpin.fa > converted.fa
). Once you do that you would be able to useblat
(or grep) to find your sequences in the file above. You may have to linearize the fasta sequences before plain grep search.thank you for comment