Skip to content

sjneph/cutnm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Simple but useful tool for subsetting & rearranging columns by name

Shane Neph

This is often nicer than cut -f1,12,23,78 < input.txt | awk '{ print $2, $1, $3, $4 }' types of approaches, where columns may change positions over time. It can rearrange any number of columns in a table (by default: separated by tabs), and will repeat columns if desired. Column names are case-sensitive by default.

Simple examples

  • cutnm Fred,Scooby,Snack < input.tsv
  • cutnm Fred,Snack,Scooby,Fred < input.tsv
  • cutnm Velma Daphne Snack < input.tsv
  • cutnm Mom Dad Sister Aunt myfile.txt

Notice that you can pass a file name as the last argument or pass things in through stdin. You can separate columns of interest by comma or space.

Suppose you have a file with names such as Indiv-1,...,Indiv-100 but you need them in some other order that matches a different table. Further, suppose second-table.txt has other columns that are not of interest, such as Color, RobotID, and Age.



indivs=$(awk 'NR == 1' second-table.txt | tr '\t' '\n' | awk '$1 ~ /^Indiv-/')
cutnm $indivs first-table.txt > result.txt

result.txt will contain the subset of columns in first-table.txt in the same order as they appear in second-table.txt

Some of the extra bits of awk and tr are just to give a little flavor to common problems.

In the event that you pass in a non-existent column name, say XYZ, the output will have XYZ_NOTFOUND, followed by NOTFOUND for all remaining rows.

cutnm is just a bash script wrapping an awk statement. It might become more sophisticated in time. I use it when dealing with large matrices, because I get tired of counting out to column 73.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages