Question

Merging specific columns from different txt files in a unit file

0

Entering edit mode

6.6 years ago

jivarajivaraj ▴ 50

Hi,

I counted reads in each bam file by featureCounts, now I have many count.txt files. how I can merge column 7th of each txt file and the first column (gene id) from one file in a unit txt file in mac OS?

$ join 7 counts24.txt counts25.txt counts26.txt counts27.txt counts28.txt counts29.txt counts30.txt counts31.txt counts32.txt counts33.txt counts34.txt counts35.txt counts36.txt counts37.txt counts38.txt counts43.txt counts44.txt counts45.txt counts46.txt counts47.txt counts48.txt counts49.txt counts50.txt counts51.txt counts52.txt counts53.txt counts54.txt counts55.txt counts56.txt counts57.txt > out.txt
*usage: join [-a fileno | -v fileno ] [-e string] [-1 field] [-2 field]
            [-o list] [-t char] file1 file2*

software • 3.1k views

ADD COMMENT • link updated 6.6 years ago by Alex Reynolds 36k • written 6.6 years ago by jivarajivaraj ▴ 50

1

Entering edit mode

simple search and you get multiple hits as to how you can use multiple bam files with featureCounts and generate one matrix with all samples for a expression matrix.

combining quantification (featureCounts) result files into a single dataset

https://support.bioconductor.org/p/64932/

Finally read the manual of featureCounts first. It supports multiple bams. There is no harm reading a software usage manual. They are designed for effective usage .

ADD REPLY • link 6.6 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Why is this a "software error"? Please use sensible tags.

What have you tried to solve this issue? Users will help you sooner if you show some effort yourself.

ADD REPLY • link 6.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Sorry, I googled and tried the above code by which I obtained error. That is why I taged with software error

ADD REPLY • link 6.6 years ago by jivarajivaraj ▴ 50

0

Entering edit mode

can you provide first few lines any two files?

ADD REPLY • link 6.6 years ago by cpad0112 21k

h.mon · Accepted Answer · 2018-05-16

4

Entering edit mode

6.6 years ago

venu 7.1k

FYI, featureCounts accepts many bam files at once and generates one count table for all BAMs.

Regarding your error, I think that's not the proper way to use join. I guess order of genes will be same in all counts.txt files from featureCounts. You can simply do paste *counts.txt with little preprocessing (i.e. keep gene_id and counts column in each file).

ADD COMMENT • link updated 6.6 years ago by h.mon 35k • written 6.6 years ago by venu 7.1k

score 3 · Accepted Answer · 2018-05-16

One way is to paste together a bunch of process substitutions, each of which that cut the desired column from its file:

$ paste <(cut -f1 somegeneid.txt) <(cut -f7 counts24.txt) <(cut -f7 counts25.txt) ... <(cut -f7 counts57.txt) > out.txt

Fill in ... with the rest of the substitutions for counts26.txt through counts56.txt. A script could programmatically generate and run this command for you if your files have a consistent naming scheme.