Skip to content

qwarksky/dsterm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Terminal CLI Dataset check

The aim of this page is to provide a few lines of commands that can be used quickly in the context of DataScience.

Approach

  1. From Seaborn's Titanic dataset, we will export it to a CSV file (main.py)
  2. This CSV file will be our starting point. It simulates a dataset that we would have received in order to process it.
  3. With Zsh, the aim is just to do a quick check of this CSV before pre-processing it with python.

Shape of file

  • Count lines-chars-Largest line width :
  > wc -lcL < snstitanic.csv
  • Unique lines (exclude index) :
  > cut -d',' -f2- snstitanic.csv | sort | uniq | wc -l 

Sed

  • Filter lines 10 to 12 :
  > sed -n '10,12p' snstitanic.csv 
  • Filter pattern "First":
  > sed -n '/First/p' snstitanic.csv  

Awk

  • Filter lines with number line :
  > awk 'NR==5, NR==12  { print NR ":" $0}' snstitanic.csv   
  • Filter pattern "First" in specific field:
  > awk -F','  '$10 ~ /First/ {print "Line " NR ":" $0}' snstitanic.csv  

Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published