I am preparing a coding and non-coding DNA training data set for CPAT. I downloaded ncRNA (FASTA) of cattle genome from ucsc ftp download using the link ftp://ftp.ensembl.org/pub/release-95/fasta/bos_taurus/ncrna/. As it contains the sequences of all ncRNAs i want to extract only the lncRNA sequence from the total ncRNAs. grep 'lncRNA' only prints the line after '>', not the fasta sequence below. How to extract fasta sequence of lncRNA from ncRNA fasta of ucsc ftp download?
You can always try with bedtools getfasta -fi <fasta_reference> -fo <fasta_output> -bed <your_interval_file>
1 observation! Be careful if you are using macOS with the carriage return (see info here if needed), as your fasta will contain only the first lane of your bed annotation.
Thank you. I directed grep 'lncRNA' output to headers.txt and used faSomeRecords.py.
Bedtools would have been easier too, but I was not sure which option to use in 'Create one BED record per:' in table browser. FaSomeRecords.py worked good.
For the record there is no FaSomeRecords.py (if there is please provide a link). faSomeRecords utility (linux version linked) is part of Jim Kent's (UCSC) tools.
For the record there is no
FaSomeRecords.py
(if there is please provide a link). faSomeRecords utility (linux version linked) is part of Jim Kent's (UCSC) tools.I downloaded from wget https://raw.githubusercontent.com/santiagosnchez/faSomeRecords/master/faSomeRecords.py