Question

RNA-seq amending fasta/fastq files (from one line into two lines)

0

Entering edit mode

7.2 years ago

canberkyurek • 0

Hi,

I have been analyzing a set of small RNA seq and I encountered a small problem with fasta/fastq files. After trimming and collapsing, I wanted to filter for reads that are 22 nt long with a Guanine in the 5'. This is the code I used to filter the reads:

cat input_wt3_trimmed_collapsed_1_2.fq | paste - - | awk 'length($4) >= 22 && length($4) <=22' | sed 's/\t/\n/g' > input_wt3_trimmed_collapsed_2.fq

awk '$2 ~ /^G/'  elution_wt1_trimmed_collapsed_1_2.fq >  elution_wt1_trimmed_collapsed_1_2_22Gs_2.fq

However, these command lines converted my fasta/fq files into one line fasta format from two lines format, here is the example:

before:

>1-1763
TACCCGTATAAGTTTCTGCTGAG
>2-1550
TGAGATCGTTCAGTACGGCAA

after:

>73-969 GAGATCGGGCGGGAAGTGGTAT
>89-940 GTTTCCGGCTCACGTCCTCTGA
>90-938 GCGTGTAAGTTCGGCGGCGTGA

I would really appreciate if you guys have any better way of fixing this problem. When I want to map these reads with STAR, it is not recognised as compatible. I guess I need to convert the final file into a two lines fasta file such as:

>73-969 
GAGATCGGGCGGGAAGTGGTAT
>89-940 
GTTTCCGGCTCACGTCCTCTGA
>90-938 
GCGTGTAAGTTCGGCGGCGTGA

What could be the best way to fix this problem?

best

Ahmet

RNA-Seq fasta siRNA • 2.1k views

ADD COMMENT • link updated 7.2 years ago by WouterDeCoster 47k • written 7.2 years ago by canberkyurek • 0

1

Entering edit mode

please reformat the examples. everything is just one line. also its length($4) == 22 and in awk you can also test for G at 5'

ADD REPLY • link 7.2 years ago by Ido Tamir 5.2k

1

Entering edit mode

I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k

0

Entering edit mode

from your code in OP, I understand that you are parsing a fq file and your output is also fastq file . But examples provided by you are neither fastq/fq nor fasta. Could you please post a record or few records from fq?

ADD REPLY • link 7.2 years ago by cpad0112 21k

score 2 · Accepted Answer · 2017-10-07

2

Entering edit mode

7.2 years ago

WouterDeCoster 47k

Looks like you need to convert a space to a newline, try with tr

cat elution_wt1_trimmed_collapsed_1_2_22Gs_2.fq | tr ' ' '\n' > output.fq

ADD COMMENT • link 7.2 years ago by WouterDeCoster 47k

0

Entering edit mode

same in sed:

$ cat test.tab 
>73-969 GAGATCGGGCGGGAAGTGGTAT
>89-940 GTTTCCGGCTCACGTCCTCTGA
>90-938 GCGTGTAAGTTCGGCGGCGTGA

in sed:

$ sed -e 's/ /\n/g' test.tab 
>73-969
GAGATCGGGCGGGAAGTGGTAT
>89-940
GTTTCCGGCTCACGTCCTCTGA
>90-938
GCGTGTAAGTTCGGCGGCGTGA

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

yup, that also works

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks mate! this solved the problem completely!

ADD REPLY • link 7.2 years ago by canberkyurek • 0

0

Entering edit mode

Glad to help.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks! I will follow your suggestions!

ADD REPLY • link 7.2 years ago by canberkyurek • 0

0

Entering edit mode

Welcome to biostars. Interesting guidelines for posting can be found in the following posts:

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k