Parse file to remove two characters from the ends of a lot of lines
1
0
Entering edit mode
7.9 years ago

Dear all,

I have a txt file with data of this form:

@HWI-ST999:188:C49E6ACXX:8:1101:11404:1998/1
NCGAGGGATGGGAGACCTGGTTGGAAATCCGTGGCTGTTTGGTTGGGGGAT
+
#4=DDFFDHDHHGJIIJIJIFHIIIJJIJJJDHIGIEDGHGI=CGHJJH9>
@HWI-ST999:188:C49E6ACXX:8:1101:1754:2212/1
TCGAATGCATGATAACAATAACCCTGGAACAGGCAACCGTTGTCCCTGACC
+
CCCFFFFFHHHGHJJJJJJJJJJJJJJJJJJJJIJJJJJIIJJJJJJJJJJ

I would like to remove the /1 of the end of every 5th line. Is this possible with a one liner in bash, maybe with sed (OSX)?

Context: I extracted reads from a bam file with Bam2Fastq into the format fastq. But the subsequent processing does not cope with the /1 or /2 in my two files of paired-end reads.

command line OSX • 2.3k views
ADD COMMENT
1
Entering edit mode

Would be more accurate to call this a fastq file, it's not just a txt file...

ADD REPLY
0
Entering edit mode

But someone who does not know what a fastq format is would be put off by it. And basically it can be called a txt file.

ADD REPLY
0
Entering edit mode

That doesn't make sense. You don't want answers from people who don't know what a fastq file is. This is biostars. We read fastq files at breakfast like normal people read the newspaper.

ADD REPLY
0
Entering edit mode

For this question it is unnecessary to know anything about biology. And I want to keep the question as easy to understand as possible. It would be more accurate to call it a fastq file, I guess, but not helpful here and in some cases maybe distracting. Although I guess 99.9% of people here know that it is a fastq file. And for those, I don't need to make it clear anyways.

ADD REPLY
2
Entering edit mode
7.9 years ago
george.ry ★ 1.2k

New file: sed '1~4s/\/1$//' myfile.fq > mynewfile.fq

Inplace: sed -i '1~4s/\/1$//' myfile.fq

ADD COMMENT
0
Entering edit mode

returns: sed: 1: "1~4s/\/1$//": invalid command code ~

ADD REPLY
0
Entering edit mode

Am I right in guessing that you're on a Mac, then?

// EDIT // Answering myself, the answer is that you are. OSX doesn't have GNU sed, so you'll need to install it with homebrew (etc).

ADD REPLY
0
Entering edit mode
$ brew list
boost       hdf5        libxml2     sratoolkit  tophat
bowtie2     htslib      openssl     szip        wget
gnu-sed     libmagic    samtools    tbb

I installed gnu-sed with homebrew. Now it returns: sed: 1: "s_188_1_seq.txt": bad flag in substitute command: 's'

EDIT: homebrew put it in a weird location, so I used it like this now:

/usr/local/Cellar/gnu-sed/4.2.2/bin/gsed -i '1~4s/\/1$//' s_188_1_seq.txt

And it worked! Thank you. <3

ADD REPLY

Login before adding your answer.

Traffic: 2557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6