Question

How to extract fasta sequence of lncRNA from whole ncRNA fasta of ucsc ftp download?

0

Entering edit mode

6.3 years ago

kousi31 ▴ 100

Hi all,

I am preparing a coding and non-coding DNA training data set for CPAT. I downloaded ncRNA (FASTA) of cattle genome from ucsc ftp download using the link ftp://ftp.ensembl.org/pub/release-95/fasta/bos_taurus/ncrna/. As it contains the sequences of all ncRNAs i want to extract only the lncRNA sequence from the total ncRNAs. grep 'lncRNA' only prints the line after '>', not the fasta sequence below. How to extract fasta sequence of lncRNA from ncRNA fasta of ucsc ftp download?

Suggestions please and thanks in advance.

sequence • 2.3k views

ADD COMMENT • link 6.3 years ago by kousi31 ▴ 100

score 0 · Answer 1 · 2019-04-03

Linearise your sequences like so (or use any one of a number of solutions on the forum):

cat fasta.fa | while read line ; do if [ "${line:0:1}" == ">" ]; then echo -e "\n"$line ; else  echo $line | tr -d '\n' ; fi ; done | tail -n+2 > linear.fasta

Then just use grep -A 1 '>lncRNA' linear.fasta

score 0 · Answer 2 · 2019-04-03

You can always try with bedtools getfasta -fi <fasta_reference> -fo <fasta_output> -bed <your_interval_file>

1 observation! Be careful if you are using macOS with the carriage return (see info here if needed), as your fasta will contain only the first lane of your bed annotation.

Cheers, and I hope it's useful

score 0 · Answer 3 · 2019-04-03

0

Entering edit mode

6.3 years ago

kousi31 ▴ 100

Thank you. I directed grep 'lncRNA' output to headers.txt and used faSomeRecords.py. Bedtools would have been easier too, but I was not sure which option to use in 'Create one BED record per:' in table browser. FaSomeRecords.py worked good.

ADD COMMENT • link 6.3 years ago by kousi31 ▴ 100

0

Entering edit mode

FaSomeRecords.py worked good.

For the record there is no FaSomeRecords.py (if there is please provide a link). faSomeRecords utility (linux version linked) is part of Jim Kent's (UCSC) tools.

ADD REPLY • link 6.3 years ago by GenoMax 152k

0

Entering edit mode

I downloaded from wget https://raw.githubusercontent.com/santiagosnchez/faSomeRecords/master/faSomeRecords.py

ADD REPLY • link 6.3 years ago by kousi31 ▴ 100