-
Notifications
You must be signed in to change notification settings - Fork 140
Getting Started
Data scraping and processing code is organised into modular and extendable jobs written in JavaScript or CoffeeScript. A typical node.io job consists of of taking some input, processing / reducing it in some way, and then outputting the emitted results, although no step is compulsory. Some jobs may not require any input, etc.
Jobs can be run from the command line or through a web interface. To run a job from the command line (extension can be omitted), run
$ node.io myjob
To run jobs through the web interface, copy your jobs to ~/.node_modules and run
$ node.io-web
Let's run through some simple examples highlighting the anatomy of a job. Each example includes a JavaScript and CoffeeScript version and omits the required var nodeio = require('node.io');
Example 1: Hello World!
hello.js
exports.job = new nodeio.Job({
input: false,
run: function() {
this.emit('Hello World!');
}
});
hello.coffee
class Hello extends nodeio.JobClass
input: false
run: (num) -> @emit 'Hello World!'
@class = Hello
@job = new Hello()
Example 2: Double each element of input
double.js
exports.job = new nodeio.Job({
input: [0,1,2],
run: function(num) {
this.emit(num * 2);
}
});
double.coffee
class Double extends nodeio.JobClass
input: [0,1,2]
run: (num) -> @emit num * 2
@class = Double
@job = new Double()
Example 3: Extend the previous example to quadruple elements
quad.js
var double = require('./double').job;
exports.job = double.extend({
run: function(num) {
this.__super__.run(num * 2);
//Same as: this.emit(num * 4)
}
});
quad.coffee
Double = require('./double').Class
class Quad extends Double
run: (num) -> super num * 2
@class = Quad
@job = new Quad()
Example 1: Files
Files can be read/written in two ways, (1) they can be specified inside of the job, or (2) specified at the command line, since node.io reads input from stdin (elements are separated by \n or \r\n) and writes output to stdout.
csv_to_tsv.js
exports.job = new nodeio.Job({
input: 'input.csv',
run: function(row) {
this.emit(row.replace(',', '\t'));
}
output: 'output.tsv',
});
csv_to_tsv.coffee - (input / output must be specified at the command line)
class CsvTsv extends nodeio.JobClass
run: (row) -> @emit row.replace ',' '\t'
@class = CsvTsv
@job = new CsvTsv()
The following two commands are equivalent
$ node.io csv_to_csv.js
$ node.io csv_to_csv.coffee < input.csv > output.csv
Example 2: Database
To read rows from a database, use the following template. start begins at 0 and num is the number of rows to return. When there are no more rows, return false.
database_template.js
exports.job = new nodeio.Job({
input: function(start, num, callback) {
//
},
run: function(row) {
//
},
output: function (rows) {
//Note: this method always receives multiple rows as an array
//Output rows
},
});