How to extract fasta sequence of lncRNA from whole ncRNA fasta of ucsc ftp download?
3
0
Entering edit mode
5.7 years ago
kousi31 ▴ 100

Hi all,

I am preparing a coding and non-coding DNA training data set for CPAT. I downloaded ncRNA (FASTA) of cattle genome from ucsc ftp download using the link ftp://ftp.ensembl.org/pub/release-95/fasta/bos_taurus/ncrna/. As it contains the sequences of all ncRNAs i want to extract only the lncRNA sequence from the total ncRNAs. grep 'lncRNA' only prints the line after '>', not the fasta sequence below. How to extract fasta sequence of lncRNA from ncRNA fasta of ucsc ftp download?

Suggestions please and thanks in advance.

sequence • 2.1k views
ADD COMMENT
0
Entering edit mode
5.7 years ago
Joe 21k

Linearise your sequences like so (or use any one of a number of solutions on the forum):

cat fasta.fa | while read line ; do if [ "${line:0:1}" == ">" ]; then echo -e "\n"$line ; else  echo $line | tr -d '\n' ; fi ; done | tail -n+2 > linear.fasta

Then just use grep -A 1 '>lncRNA' linear.fasta

ADD COMMENT
0
Entering edit mode
5.7 years ago

You can always try with bedtools getfasta -fi <fasta_reference> -fo <fasta_output> -bed <your_interval_file>

1 observation! Be careful if you are using macOS with the carriage return (see info here if needed), as your fasta will contain only the first lane of your bed annotation.

Cheers, and I hope it's useful

ADD COMMENT
0
Entering edit mode
5.7 years ago
kousi31 ▴ 100

Thank you. I directed grep 'lncRNA' output to headers.txt and used faSomeRecords.py. Bedtools would have been easier too, but I was not sure which option to use in 'Create one BED record per:' in table browser. FaSomeRecords.py worked good.

ADD COMMENT
0
Entering edit mode

FaSomeRecords.py worked good.

For the record there is no FaSomeRecords.py (if there is please provide a link). faSomeRecords utility (linux version linked) is part of Jim Kent's (UCSC) tools.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6