RNA-seq amending fasta/fastq files (from one line into two lines)
1
0
Entering edit mode
7.2 years ago

Hi,

I have been analyzing a set of small RNA seq and I encountered a small problem with fasta/fastq files. After trimming and collapsing, I wanted to filter for reads that are 22 nt long with a Guanine in the 5'. This is the code I used to filter the reads:

cat input_wt3_trimmed_collapsed_1_2.fq | paste - - | awk 'length($4) >= 22 && length($4) <=22' | sed 's/\t/\n/g' > input_wt3_trimmed_collapsed_2.fq

awk '$2 ~ /^G/'  elution_wt1_trimmed_collapsed_1_2.fq >  elution_wt1_trimmed_collapsed_1_2_22Gs_2.fq

However, these command lines converted my fasta/fq files into one line fasta format from two lines format, here is the example:

before:

>1-1763
TACCCGTATAAGTTTCTGCTGAG
>2-1550
TGAGATCGTTCAGTACGGCAA

after:

>73-969 GAGATCGGGCGGGAAGTGGTAT
>89-940 GTTTCCGGCTCACGTCCTCTGA
>90-938 GCGTGTAAGTTCGGCGGCGTGA

I would really appreciate if you guys have any better way of fixing this problem. When I want to map these reads with STAR, it is not recognised as compatible. I guess I need to convert the final file into a two lines fasta file such as:

>73-969 
GAGATCGGGCGGGAAGTGGTAT
>89-940 
GTTTCCGGCTCACGTCCTCTGA
>90-938 
GCGTGTAAGTTCGGCGGCGTGA

What could be the best way to fix this problem?

best

Ahmet

RNA-Seq fasta siRNA • 2.0k views
ADD COMMENT
1
Entering edit mode

please reformat the examples. everything is just one line. also its length($4) == 22 and in awk you can also test for G at 5'

ADD REPLY
1
Entering edit mode

I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

from your code in OP, I understand that you are parsing a fq file and your output is also fastq file . But examples provided by you are neither fastq/fq nor fasta. Could you please post a record or few records from fq?

ADD REPLY
2
Entering edit mode
7.2 years ago

Looks like you need to convert a space to a newline, try with tr

cat elution_wt1_trimmed_collapsed_1_2_22Gs_2.fq | tr ' ' '\n' > output.fq
ADD COMMENT
0
Entering edit mode

same in sed:

$ cat test.tab 
>73-969 GAGATCGGGCGGGAAGTGGTAT
>89-940 GTTTCCGGCTCACGTCCTCTGA
>90-938 GCGTGTAAGTTCGGCGGCGTGA

in sed:

$ sed -e 's/ /\n/g' test.tab 
>73-969
GAGATCGGGCGGGAAGTGGTAT
>89-940
GTTTCCGGCTCACGTCCTCTGA
>90-938
GCGTGTAAGTTCGGCGGCGTGA
ADD REPLY
0
Entering edit mode

yup, that also works

ADD REPLY
0
Entering edit mode

Thanks mate! this solved the problem completely!

ADD REPLY
0
Entering edit mode

Glad to help.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Thanks! I will follow your suggestions!

ADD REPLY

Login before adding your answer.

Traffic: 2939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6