$ mkdir MyDataScienceToolbox
$ cd MyDataScienceToolbox
标签:
Data Science at the Command Line is a new book written by Jeroen Janssens. This website contains information about the upcoming workshop in London, the webcast from August 20th, instructions on how to install the Data Science Toolbox, and an overview of all the command-line tools discussed in the book.
This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You‘ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.
Discover why the command line is an agile, scalable, and extensible technology. Even if you‘re already comfortable processing data with, say, Python or R, you‘ll greatly improve your data science workflow by also leveraging the power of the command line.
On February 20 and 21, 2015 in London, I‘ll give a two-day workshop about Data Science at the Command Line. This workshop is organized by The School of Data Science. The registration page contains information about the pricing, an overview of the topics we‘ll cover, and the prerequisites needed for this workshop. You‘ll receive a free copy of the book.
On August 20, 2014 at 17:00 UTC, I did a two-hour webcast about the book. With over 1,000 attendees, the webcast was a great success. In case you missed it you can view the recording. You may need to sign up or log in.
If you want to follow along with the code and commands presented during the webcast, you can install the Data Science Toolbox. This is an easy-to-install virtual machine that contains all the command-line tools discussed in the book. It runs on Microsoft Windows, Mac OS X, and Linux. See below for installation instructions.
In both the book and the webcast we use many different command-line tools. The distribution of GNU/Linux that we assume, Ubuntu, comes with a whole bunch of command-line tools pre-installed. Moreover, Ubuntu offers many packages that contain other, relevant command-line tools. Installing these packages yourself is not too difficult. However, we also use command-line tools that are not available as packages and require a more manual, and more involved, installation. In order to acquire the necessary command-line tools without having to go through the involved installation process of each, we encourage you to install the Data Science Toolbox.
The Data Science Toolbox is a virtual environment that allows you to get started doing data science in minutes. The default version comes with commonly used software for data science, including the Python scientific stack and R together with its most popular packages. Additional software and data bundles are easily installed. These bundles can be specific to a certain book, course, or organization. You can read more about the Data Science Toolbox in general athttp://datasciencetoolbox.org.
While there exists a bundle for this book, which contains all the data, scripts, and command-line tools used in this book, we have also created a pre-bundled Data Science Toolbox specifically for this book. The following five steps describe how to install it.
Browse to the Virtualbox download page (https://www.virtualbox.org/wiki/Downloads) and download the appropriate binary for your operating system. Open the binary and follow the installations instructions.
Similarly to Step 1, browse to the Vagrant (HashiCorp, 2014) download page (http://www.vagrantup.com/downloads.html) and download the appropriate binary. Open the binary and follow the installations instructions. If you already have Vagrant installed, please make sure that it’s version 1.5 or higher.
Open a terminal (known as the command prompt in Microsoft Windows). Create a directory, for example MyDataScienceToolbox, and navigate to it by typing:
$ mkdir MyDataScienceToolbox
$ cd MyDataScienceToolbox
In order to initialize the Data Science Toolbox, run the following command:
$ vagrant init data-science-toolbox/data-science-at-the-command-line
This creates a file named Vagrantfile. This is a configuration file that tells Vagrant how to launch the virtual machine. This file contains a lot of lines that are commented out. A minimal version is:
Vagrant.configure(2) do |config|
config.vm.box = "data-science-toolbox/data-science-at-the-command-line"
end
By running the following command, the Data Science Toolbox will be downloaded and booted.
$ vagrant up
If everything went well, then you now have a Data Science Toolbox running on your local machine.
If you ever see the message default: Warning: Connection timeout. Retrying...
printed repeatedly, then it may be that the virtual machine is waiting for input. This may happen when the virtual machine has not been properly shut down. In order to find out what’s wrong, add the following lines to Vagrantfile before the last end
statement:
config.vm.provider "virtualbox" do |vb|
vb.gui = true
end
This will cause VirtualBox to show a screen. Once the virtual machine has booted and you have identified the problem, you can remove these lines from Vagrantfile. The username and password are both vagrant.
Here‘s a slightly more elaborate Vagrantfile:
Vagrant.require_version ">= 1.5.0"
Vagrant.configure(2) do |config|
config.vm.box = "data-science-toolbox/data-science-at-the-command-line"
config.vm.network "forwarded_port", guest: 8000, host: 8000
config.vm.provider "virtualbox" do |vb|
vb.gui = true
vb.memory = 2048
vb.cpus = 2
end
end
This file, which you can download here, ensures that Vagrant:
You can view more configuration option at http://docs.vagrantup.com/v2.
If you are running Linux, Mac OS X, or some other UNIX-like operating system, you can log in to the Data Science Toolbox by running the following command in a terminal:
$ vagrant ssh
After a few seconds you will be greeted with the following message:
Welcome to the Data Science Toolbox for Data Science at the Command Line
Based on Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
* Data Science at the Command Line: http://datascienceatthecommandline.com
* Data Science Toolbox: http://datasciencetoolbox.org
* Ubuntu documentation: http://help.ubuntu.com
Last login: Tue Jul 22 19:33:16 2014 from 10.0.2.2
If you are running Microsoft Windows, you need to either run Vagrant with a graphical user interface (see above on how to set that up) or use a third-party application in order to log in to the Data Science Toolbox. We recommend PuTTY for this. Browse to the PuTTY download page (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) and downloadputty.exe. Run PuTTY, and enter the following values:
If you want, you can save these values as a session by clicking the Save button, so that you do not need to enter these values again. Click the Open button and enter vagrant for both the username and the password.
Currently, the Data Science Toolbox does not contain all the data and scripts. An update will be uploaded once the book is finished. For now, in order to obtain the data and scripts, you run the following:
$ cd ~/book
$ git pull
This is an overview of all the command-line tools discussed in the book. (Please note that, due to time constraints, the webcast will only discuss a subset.) This includes binary executables, interpreted scripts, and Bash builtins and keywords. For each command-line tool, we state, when available and appropriate, the following information:
All command-line tools listed here are included in the Data Science Toolbox for Data Science at the Command Line. See the previous sections for instructions on how to set it up. The install commands assume that you‘re running Ubuntu 14.04. Please note that citing open source software is not trivial, and that some information may be missing or incorrect.
Define or display aliases. Alias is a Bash builtin.
$ help alias
$ alias ll=‘ls -alF‘
Pattern scanning and text processing language. Mawk (version 1.3.3) by Mike Brennan (1994).http://invisible-island.net/mawk.
$ sudo apt-get install mawk
$ man awk
$ seq 5 | awk ‘{sum+=$1} END {print sum}‘
15
Manage AWS Services such as EC2 and S3 from the command line. AWS Command Line Interface (version 1.3.24) by Amazon Web Services (2014). http://aws.amazon.com/cli.
$ sudo pip install awscli
$ aws help
$ aws ec2 describe-regions | head -n 5
{
"Regions": [
{
"Endpoint": "ec2.eu-west-1.amazonaws.com",
"RegionName": "eu-west-1"
GNU Bourne-Again SHell. Bash (version 4.3) by Brian Fox and Chet Ramey (2010).http://www.gnu.org/software/bash.
$ sudo apt-get install bash
$ man bash
Evaluate equation from standard input. Bc (version 1.06.95) by Philip A. Nelson (2006).http://www.gnu.org/software/bc.
$ sudo apt-get install bc
$ man bc
$ echo ‘e(1)‘ | bc -l
2.71828182845904523536
Access BigML’s prediction API. BigMLer (version 1.12.2) by BigML (2014).http://bigmler.readthedocs.org.
$ sudo pip install bigmler
$ bigmler --help
Apply expression to all but the first line. Useful if you want to apply classic command-line tools to CSV files with a header. Body by Jeroen H.M. Janssens (2014).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ echo -e "value\n7\n2\n5\n3" | body sort -n
value
2
3
5
7
Concatenate files and standard input, and print on standard output. Cat (version 8.21) by Torbjorn Granlund and Richard M. Stallman (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man cat
$ cat results-01 results-02 results-03 > results-all
Change the shell working directory. Cd is a Bash builtin.
$ help cd
$ cd ~; pwd; cd ..; pwd
/home/vagrant
/home
Change file mode bits. We use it to make our command-line tools executable. Chmod (version 8.21) by David MacKenzie and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man chmod
$ chmod u+x experiment.sh
Apply a command to a subset of the columns and merge the result back with the remaining columns. Cols by Jeroen H.M. Janssens (2014). https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ < iris.csv cols -C species body tapkee --method pca | header -r x,y,species
Generate an ASCII picture of a cow with a message. Particularly useful when building up a particular pipeline is starting to frustrate you a bit too much. Cowsay (version 3.03+dfsg1) by Tony Monroe (1999).
$ sudo apt-get install cowsay
$ man cowsay
$ echo ‘The command line is awesome!‘ | cowsay
______________________________
< The command line is awesome! >
------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/ ||----w |
|| ||
Copy files and directories. Cp (version 8.21) by Torbjorn Granlund, David MacKenzie, and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man cp
Extract columns from CSV data. Like cut
command-line tool, but for tabular data. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvcut --help
Filter tabular data to only those rows where certain columns contain a given value or match a regular expression. Csvkit (version 0.8.0) by Christopher Groskopf (2014).http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvgrep --help
Merge two or more CSV tables together using a method analogous to SQL JOIN operation. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvjoin --help
Renders a CSV file to the command line in a readable, fixed-width format. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvlook --help
$ echo -e "a,b\n1,2\n3,4" | csvlook
|----+----|
| a | b |
|----+----|
| 1 | 2 |
| 3 | 4 |
|----+----|
Sort CSV files. Like the sort
command-line tool, but for tabular data. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvsort --help
Execute SQL queries directly on CSV data or insert CSV into a database. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvsql --help
Stack up the rows from multiple CSV files, optionally adding a grouping value to each row. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvstack --help
Print descriptive statistics for all columns in a CSV file. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ csvstat --help
Download data from a URL. cURL (version 7.35.0) by Daniel Stenberg (2012).http://curl.haxx.se.
$ sudo apt-get install curl
$ man curl
Remove sections from each line of files. Cut (version 8.21) by David M. Ihnat, David MacKenzie, and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man cut
Display an image or image sequence on any X server. Can read image data from standard input. Display (version 8:6.7.7.10) by ImageMagick Studio LLC (2009). http://www.imagemagick.org.
$ sudo apt-get install imagemagick
$ man display
Manage a data workflow. Drake (version 0.1.6) by Factual (2014).https://github.com/Factual/drake.
$ # Please see Chapter 6 for installation instructions.
$ drake --help
Generate sequence of dates relative to today. Dseq by Jeroen H.M. Janssens (2014).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ dseq -2 0 # day before yesterday till today
2014-07-15
2014-07-16
2014-07-17
Display a line of text. Echo (version 8.21) by Brian Fox and Chet Ramey (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man echo
Run a program in a modified environment. It’s often used to specify which interpreter should run our script. Env (version 8.21) by Richard Mlynarik and David MacKenzie (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man env
$ #!/usr/bin/env python
Set export attribute for shell variables. Useful for making shell variables available to other command-line tools. Export is a Bash builtin.
$ help export
$ export WEKAPATH=$HOME/bin
Generate a script for gnuplot
while passing data to standard input. Feedgnuplot (version 1.32) by Dima Kogan (2014). http://search.cpan.org/perldoc?feedgnuplot.
$ sudo apt-get install feedgnuplot
$ man feedgnuplot
Search for files in a directory hierarchy. Find (version 4.4.2) by James Youngman (2008).http://www.gnu.org/software/findutils.
$ sudo apt-get install findutils
$ man find
Execute commands for each member in a list. In Chapter 8, we discuss the advantages of usingparallel
instead of for
. For is a Bash keyword.
$ help for
$ for i in {A..C} "It‘s easy as" {1..3}; do echo $i; done
A
B
C
It‘s easy as
1
2
3
Manage repositories for git, which is a distributed version control system. Git (version 1:1.9.1) by Linus Torvalds and Junio C. Hamano (2014). http://git-scm.com.
$ sudo apt-get install git
$ man git
Print lines matching a pattern. Grep (version 2.16) by Jim Meyering (2012).http://www.gnu.org/software/grep.
$ sudo apt-get install grep
$ man grep
Output the first part of files. Head (version 8.21) by David MacKenzie and Jim Meyering (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man head
$ seq 5 | head -n 3
1
2
3
Add, replace, and delete header lines. Header by Jeroen H.M. Janssens (2014).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ header -h
Convert common, but less awesome, tabular data formats to CSV. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ in2csv --help
Process JSON. Jq (version jq-1.4) by Stephen Dolan (2014). http://stedolan.github.com/jq.
$ # See website for installation instructions
$ # See website for documentation
Convert JSON to CSV. Json2Csv (version 1.1) by Jehiah Czebotar (2014).https://github.com/jehiah/json2csv.
$ go get github.com/jehiah/json2csv
$ json2csv --help
Paginate large files. Less (version 458) by Mark Nudelman (2013).http://www.greenwoodsoftware.com/less.
$ sudo apt-get install less
$ man less
$ csvlook iris.csv | less
List directory contents. Ls (version 8.21) by Richard M. Stallman and David MacKenzie (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man ls
Read reference manuals of command-line tools. Man (version 2.6.7.1) by John W. Eaton and Colin Watson (2014).
$ sudo apt-get install man
$ man man
$ man grep
Make directories. Mkdir (version 8.21) by David MacKenzie (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man mkdir
Move or rename files and directories. Mv (version 8.21) by Mike Parker, David MacKenzie, and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man mv
Build and execute shell command lines from standard input in parallel. GNU Parallel (version 20140622) by Ole Tange (2014). http://www.gnu.org/software/parallel.
$ # See website for installation instructions
$ man parallel
$ seq 3 | parallel echo Processing file {}.csv
Processing file 1.csv
Processing file 2.csv
Processing file 3.csv
Merge lines of files. Paste (version 8.21) by David M. Ihnat and David MacKenzie (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man paste
Run bc
with parallel
. First column of input CSV is mapped to {1}
, second to {2}
, and so forth. Pbc by Jeroen H.M. Janssens (2014). https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ seq 5 | pbc ‘{1}^2‘
1
4
9
16
25
Install and manage Python packages. Pip (version 1.5.4) by PyPA (2014). https://pip.pypa.io.
$ sudo apt-get install python-pip
$ man pip
Print name of current working directory. Pwd (version 8.21) is a Bash builtin by Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ man pwd
$ pwd
/home/vagrant
Execute Python, which is an interpreted, interactive, and object-oriented programming language. Python (version 2.7.5) by Python Software Foundation (2014).http://www.python.org.
$ sudo apt-get install python
$ man python
Analyze data and create visualizations with the R
programming language. To install the latest version of R
on Ubuntu, follow the instructions on http://cran.r-project.org/bin/linux/ubuntu/README. R (version 3.1.1) by R Foundation for Statistical Computing (2014). http://www.r-project.org.
$ sudo apt-get install r-base-dev
$ man R
Load CSV from standard input into R as a data.frame, execute given commands, and get the output as CSV or PNG. Rio by Jeroen H.M. Janssens (2014).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ Rio -h
$ seq 10 | Rio -nf sum
55
Create a scatter plot from CSV using Rio
. Rio-Scatter by Jeroen H.M. Janssens (2014).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ < iris.csv Rio-scatter sepal_length sepal_width species > iris.png
Remove files or directories. Rm (version 8.21) by Paul Rubin, David MacKenzie, Richard M. Stallman, and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man rm
Run machine learning experiments with scikit-learn. SciKit-Learn Laboratory (version 0.26.0) by Educational Testing Service (2014). https://skll.readthedocs.org.
$ sudo pip install skll
$ run_experiment --help
Print lines from standard output with a given probability, for a given duration, and with a given delay between lines. Sample by Jeroen H.M. Janssens (2014).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ sample --help
Copy remote files securely. Scp (version 1:6.6p1) by Timo Rinne and Tatu Ylonen (2014).http://www.openssh.com.
$ sudo apt-get install openssh-client
$ man scp
Extract HTML elements using an XPath query or CSS3 selector. Scrape by Jeroen H.M. Janssens (2014). https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ curl -sL ‘http://datasciencetoolbox.org‘ | scrape -e ‘head > title‘
<title>Data Science Toolbox</title>
Filter and transform text. Sed (version 4.2.2) by Jay Fenlason, Tom Lord, Ken Pizzini, and Paolo Bonzini (2012). http://www.gnu.org/software/sed.
$ sudo apt-get install sed
$ man sed
Print a sequence of numbers. Seq (version 8.21) by Ulrich Drepper (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man seq
$ seq 5
1
2
3
4
5
Generate random permutations. Shuf (version 8.21) by Paul Eggert (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man shuf
Sort lines of text files. Sort (version 8.21) by Mike Haertel and Paul Eggert (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man sort
Split a file into pieces. Split (version 8.21) by Torbjorn Granlund and Richard M. Stallman (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man split
Executes arbitrary commands against an SQL database and outputs the results as a CSV. Csvkit (version 0.8.0) by Christopher Groskopf (2014). http://csvkit.readthedocs.org.
$ sudo pip install csvkit
$ sql2csv --help
Login to remote machines. OpenSSH client (version 1.8.9) by Tatu Ylonen, Aaron Campbell, Bob Beck, Markus Friedl, Niels Provos, Theo de Raadt, Dug Song, and Markus Friedl (2014).http://www.openssh.com.
$ sudo apt-get install ssh
$ man ssh
Execute a command as another user. Sudo (version 1.8.9p5) by Todd C. Miller (2013).http://www.sudo.ws/sudo.
$ sudo apt-get install sudo
$ man sudo
Output the last part of files. Tail (version 8.21) by Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Meyering (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man tail
$ seq 5 | tail -n 3
3
4
5
Reduce dimensionality of a data set using various algorithms. Tapkee by Sergey Lisitsyn and Fernando Iglesias (2014). http://tapkee.lisitsyn.me.
$ # See website for installation instructions
$ tapkee --help
$ < iris.csv cols -C species body tapkee --method pca | header -r x,y,species
Create, list, and extract tar archives. Tar (version 1.27.1) by Jeff Bailey, Paul Eggert, and Sergey Poznyakoff (2014). http://www.gnu.org/software/tar.
$ sudo apt-get install tar
$ man tar
Read from standard input and write to standard output and files. Tee (version 8.21) by Mike Parker, Richard M. Stallman, and David MacKenzie (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man tee
Translate or delete characters. Tr (version 8.21) by Jim Meyering (2012).http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man tr
List contents of directories in a tree-like format. Tree (version 1.6.0) by Steve Baker (2014).https://launchpad.net/ubuntu/+source/tree.
$ sudo apt-get install tree
$ man tree
Display the type of a command-line tool. Type is a Bash builtin.
$ help type
$ type cd
cd is a shell builtin
Report or omit repeated lines. Uniq (version 8.21) by Richard M. Stallman and David MacKenzie (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man uniq
Extract common file formats. Unpack by Patrick Brisbin (2013).https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
$ unpack file.tgz
Extract files from rar archives. Unrar (version 1:0.0.1+cvs20071127) by Ben Asselstine, Christian Scheurer, and Johannes Winkelmann (2014). http://home.gna.org/unrar.
$ sudo apt-get install unrar-free
$ man unrar
List, test and extract compressed files in a ZIP archive. Unzip (version 6.0) by Samuel H. Smith (2009). http://www.info-zip.org/pub/infozip.
$ sudo apt-get install unzip
$ man unzip
Print newline, word, and byte counts for each file. Wc (version 8.21) by Paul Rubin and David MacKenzie (2012). http://www.gnu.org/software/coreutils.
$ sudo apt-get install coreutils
$ man wc
$ echo ‘hello world‘ | wc -c
12
Weka is a collection of machine learning algorithms for data mining tasks by Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. This command-line tool allows you to run Weka from the command line. Weka command-line tool by Jeroen H.M. Janssens (2014). https://github.com/jeroenjanssens/data-science-at-the-command-line.
$ git clone https://github.com/jeroenjanssens/data-science-at-the-command-line.git
Locate a command-line tool. Does not work for Bash builtins. Which by Unknown (2009).
$ man which
$ which man
/usr/bin/man
Convert XML to JSON Xml2Json (version 0.0.2) by Francois Parmentier (2014).https://github.com/parmentf/xml2json.
$ npm install xml2json-command
$ xml2json < input.xml > output.json
【转载】Data Science at the Command Line
标签:
原文地址:http://www.cnblogs.com/daleloogn/p/4235069.html