Question

Sequence extract from a big fast file

0

Entering edit mode

9.7 years ago

vahapel ▴ 210

Dear All,

After sequencing reaction and extensive filtering, we converted ".fastq" files into ".fasta" files and each fasta file has approximately ~67 million reads. My question is that is there any script for extracting first 30 million reads, then remaining 37 million reads with sequential manner.

Thank you for all your help!

next-gen-sequencing rna-seq • 2.5k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by vahapel ▴ 210

score 1 · Answer 1 · 2015-03-29

1

Entering edit mode

9.7 years ago

5heikki 11k

Assuming no linebreaks in sequences, this is as simple as:

head -n x file > output

Where x is number of seqs times 2 (one line for header, one for sequence). Similarly, you can get the last x sequences utilizing tail.

ADD COMMENT • link 9.7 years ago by 5heikki 11k

0

Entering edit mode

Thak you so much, 5heikki. It seems very practical way for such a purpose !

ADD REPLY • link 9.7 years ago by vahapel ▴ 210

score 0 · Answer 2 · 2015-03-29

0

Entering edit mode

9.7 years ago

GouthamAtla 12k

split fasta file

ADD COMMENT • link 9.7 years ago by GouthamAtla 12k

0

Entering edit mode

Hi, Geek_y, "split fasta file" scripts will be useful for my project, thank you for your help!

ADD REPLY • link 9.7 years ago by vahapel ▴ 210

0

Entering edit mode

Hi,

I am a novice using R to split a FASTA file with 300 000 contigs into 6 file of less than 50 000 contigs. I have seen many options but would anyone advise anything that I could be used in R? Thank you A

ADD REPLY • link 8.5 years ago by alexandra.lanot • 0