@@ -24,19 +24,17 @@ Creating a JDBC river is easy:
24
24
Assuming you have a table of name ` orders ` , you can issue this simple command from the command line
25
25
26
26
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
27
- "max_bulk_actions" : 10000,
28
27
"type" : "jdbc",
29
28
"jdbc" : {
30
29
"url" : "jdbc:mysql://localhost:3306/test",
31
30
"user" : "",
32
31
"password" : "",
33
- "fetchsize" : "min",
34
32
"sql" : "select * from orders"
35
33
}
36
34
}'
37
35
38
36
39
- Note: the ` max_bulk_actions ` is set by default to 100 and have to be enlarged for most use cases, and
37
+ Note: the ` max_bulk_actions ` parameter is set by default to 10000 and have to be enlarged for most use cases, and
40
38
MySQL streaming mode is activated only by setting the row fetch size to Integer.MIN_VALUE, which can be
41
39
achieved by using the string ` "min" ` for the parameter ` fetchsize ` .
42
40
@@ -171,13 +169,11 @@ Internet access (of course)
171
169
172
170
```
173
171
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
174
- "max_bulk_actions" : 10000,
175
172
"type" : "jdbc",
176
173
"jdbc" : {
177
174
"url" : "jdbc:mysql://localhost:3306/test",
178
175
"user" : "",
179
176
"password" : "",
180
- "fetchsize" : "min",
181
177
"sql" : "select * from orders"
182
178
}
183
179
}'
@@ -215,14 +211,12 @@ The general schema of a JDBC river instance declaration is
215
211
Example:
216
212
217
213
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
218
- "max_bulk_actions" : 1000,
219
214
"type" : "jdbc",
220
215
"jdbc" : {
221
216
"url" : "jdbc:mysql://localhost:3306/test",
222
217
"user" : "",
223
218
"password" : "",
224
219
"sql" : "select * from orders",
225
- "fetchsize" : "min",
226
220
"index" : "myindex",
227
221
"type" : "mytype",
228
222
...
@@ -267,9 +261,9 @@ Quartz cron expression format (see below).
267
261
268
262
`interval` - a time value for the delay between two river runs (default: not set)
269
263
270
- `max_bulk_actions` - the length of each bulk index request submitted
264
+ `max_bulk_actions` - the length of each bulk index request submitted (default: 10000)
271
265
272
- `max_concurrrent_bulk_requests` - the maximum number of concurrent bulk requests
266
+ `max_concurrrent_bulk_requests` - the maximum number of concurrent bulk requests (default: 2 * number of CPU cores)
273
267
274
268
`max_bulk_volume` - a byte size parameter for the maximum volume allowed for a bulk request (default: "10m")
275
269
@@ -372,7 +366,7 @@ Quartz cron expression format (see below).
372
366
"schedule" : null,
373
367
"interval" : 0L,
374
368
"threadpoolsize" : 4,
375
- "max_bulk_actions" : 100 ,
369
+ "max_bulk_actions" : 10000 ,
376
370
"max_concurrent_bulk_requests" : 2 * available CPU cores,
377
371
"max_bulk_volume" : "10m",
378
372
"max_request_wait" : "60s",
@@ -387,7 +381,7 @@ Quartz cron expression format (see below).
387
381
"rounding" : null,
388
382
"scale" : 2,
389
383
"autocommit" : false,
390
- "fetchsize" : 10,
384
+ "fetchsize" : 10, /* MySQL: Integer.MIN */
391
385
"max_rows" : 0,
392
386
"max_retries" : 3,
393
387
"max_retries_wait" : "30s",
@@ -502,64 +496,61 @@ It is very important to note that overuse of overflowing ranges creates ranges t
502
496
and no effort has been made to determine which interpretation CronExpression chooses.
503
497
An example would be "0 0 14-6 ? * FRI-MON".
504
498
505
- ## How to start a JDBC feeder
506
-
507
- In the `bin/feeder` directory, you find some feeder examples.
508
-
509
- A feed can be started from the `$ES_HOME/plugins/jdbc` folder. If not already present, you should
510
- create a `bin` folder so it is easy to maintain feeder script side by side with the river.
499
+ ## How to run a standalone JDBC feeder
511
500
512
- The feeder script must include the Elasticsearch core libraries into the classpath. Note the `-cp`
513
- parameter.
501
+ A feeder can be started from a shell script. For this , the Elasticsearch home directory must be set in
502
+ the environment variable ES_HOME. The JDBC plugin jar must be placed in the same directory of the script,
503
+ together with JDBC river jar(s).
514
504
515
- Here is an example of a feeder bash script in `$ES_HOME/plugins/jdbc/bin/feeder/oracle.create.sh`
505
+ Here is an example of a feeder bash script:
516
506
517
507
#!/bin/sh
518
508
519
- java="/usr/bin/java"
520
- #java="/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java"
521
- #java="/usr/java/jdk1.8.0/bin/java"
522
-
509
+ DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
510
+
511
+ # ES_HOME required to detect elasticsearch jars
512
+ export ES_HOME=~es/elasticsearch-1.4.0.Beta1
513
+
523
514
echo '
524
515
{
525
516
"elasticsearch" : {
526
517
"cluster" : "elasticsearch",
527
518
"host" : "localhost",
528
519
"port" : 9300
529
520
},
530
- "concurrency" : 1,
531
521
"type" : "jdbc",
532
522
"jdbc" : {
533
- "url" : "jdbc:oracle:thin:@//host:1521/sid",
534
- "user" : "user",
535
- "password" : "password",
536
- "sql" : "select or_id as \"_id\", or_tan as \"tan\" from orders",
537
- "index" : "myoracle",
538
- "type" : "myoracle",
539
- "index_settings" : {
540
- "index" : {
541
- "number_of_shards" : 1,
542
- "number_of_replica" : 0
543
- }
544
- }
545
- }
523
+ "url" : "jdbc:mysql://localhost:3306/test",
524
+ "user" : "",
525
+ "password" : "",
526
+ "sql" : "select *, page_id as _id from page",
527
+ "treat_binary_as_string" : true,
528
+ "index" : "metawiki"
529
+ }
546
530
}
547
- ' | ${ java} \
548
- -cp $(pwd):$(pwd)/\*:$(pwd)/../../lib/\* \
531
+ ' | java \
532
+ -cp "${DIR}/*" \
549
533
org.xbib.elasticsearch.plugin.jdbc.feeder.Runner \
550
534
org.xbib.elasticsearch.plugin.jdbc.feeder.JDBCFeeder
551
535
552
- The `jdbc` parameter structure is exactly the same as in a river.
536
+ How does it work?
537
+
538
+ - first the shell script finds out about the directory where the script is placed, and it is placed into a variable `DIR`
553
539
554
- The feeder is invoked by `JDBCFeeder` class and understands some more parameters. In this example,
555
- the default parameters are shown.
540
+ - second, the location of the Elasticsearch home is exported in a shell variable `ES_HOME`
556
541
557
- `elasticsearch` - an structure describing cluster, host, and port of a host of an Elasticsearch cluster.
542
+ - the classpath must be set to `DIR/*` to detect the JDBC plugin jar in the same directory of the script
558
543
559
- `concurrency` - how many `jdbc` jobs should be executed in parallel
544
+ - the "Runner" class is able to expand the classpath over the Elasticsearch jars in `ES_HOME/lib` and looks also in `ES_HOME/plugins/jdbc`
560
545
561
- In the example, you can also see that you can change your favorite `java` executable when
562
- executing a feed. You must use a Java JDK >= 1.7
546
+ - the "Runner" class invokes the "JDBCFeeder", which reads a JSON file from stdin, which corresponds to a JDBC river definition
547
+
548
+ - the `elasticsearch` structure specifies the cluster, host, and port of a connection to an Elasticsearch cluster
549
+
550
+ The `jdbc` parameter structure in the definition is exactly the same as in a river.
551
+
552
+ It is possible to write an equivalent of this bash script for Windows.
553
+ If you can send one to me for documentation on this page, I'd be very grateful.
563
554
564
555
## Structured objects
565
556
@@ -812,56 +803,40 @@ in callable statement result sets.
812
803
If there is more than one result sets returned by a callable statement,
813
804
the JDBC plugin enters a loop and iterates through all result sets.
814
805
815
- # Monitoring the JDBC river state
806
+ # Monitoring the JDBC plugin state
816
807
817
808
While a river/feed is running, you can monitor the activity by using the `_state` command.
818
809
819
- When running very large data fetches, it might be of interest to find out if the fetch is complete or still running.
810
+ The `_state` command can show the state of a specific river or of all rivers,
811
+ when an asterisk `*` is used as the river name.
820
812
821
- The `_state` command can show the state of a specific river or of all rivers, when an asterisk `*` is used as the river name .
813
+ The river state mechanism is specific to JDBC plugin implementation. It is part of the cluster metadata .
822
814
823
- In the result, you can evaluate the field `active`. If set to `true`, the river is actively fetching data from the database.
815
+ In the response, the field `started` will represent the time when the river/feeder was created.
816
+ The field `last_active_begin` will represent the last time when a river/feeder run had begun, and
817
+ the field `last_active_end` is null if th river/feeder runs, or will represent the last time the river/feeder
818
+ has completed a run.
824
819
825
- In the field `timestamp`, the latest state modification of the river is recorded.
820
+ The `map` carries some flags for the river: `aborted`, `suspended`, and a `counter` for the number of
821
+ invocations on this node.
826
822
827
823
Example:
828
824
829
825
curl 'localhost:9200/_river/jdbc/*/_state?pretty'
830
826
{
831
827
"state" : [ {
832
- "name" : "my_oracle_river ",
828
+ "name" : "feeder ",
833
829
"type" : "jdbc",
834
- "enabled" : true,
835
- "started" : "2014-05-10T20:29:04.260Z",
836
- "timestamp" : "2014-05-10T20:52:15.866Z",
837
- "counter" : 3,
838
- "active" : true,
839
- "custom" : {
840
- "rivername" : "feeder",
841
- "settings" : {
842
- "index" : "myoracle",
843
- "sql" : [ "select or_id as \"_id\", or_tan as \"tan\" from orders" ],
844
- "maxbulkactions" : 10,
845
- "type" : "myoracle",
846
- "password" : "...",
847
- "user" : "...",
848
- "url" : "jdbc:oracle:thin:@//localhost:1521/sid"
849
- },
850
- "locale" : "de_",
851
- "job" : null,
852
- "sql" : [ "statement=select or_id as \"_id\", or_tan as \"tan\" from orders parameter=[] callable=false" ],
853
- "autocommit" : false,
854
- "fetchsize" : 10,
855
- "maxrows" : 0,
856
- "retries" : 3,
857
- "maxretrywait" : "30s",
858
- "resultsetconcurrency" : "CONCUR_UPDATABLE",
859
- "resultsettype" : "TYPE_FORWARD_ONLY",
860
- "rounding" : 0,
861
- "scale" : 2
830
+ "started" : "2014-10-18T13:38:14.436Z",
831
+ "last_active_begin" : "2014-10-18T17:46:47.548Z",
832
+ "last_active_end" : "2014-10-18T13:42:57.678Z",
833
+ "map" : {
834
+ "aborted" : false,
835
+ "suspended" : false,
836
+ "counter" : 6
862
837
}
863
838
} ]
864
- }
839
+ }
865
840
866
841
867
842
# Advanced topics
0 commit comments