Skip to content

Commit 3d89811

Browse files
committed
Merge branch 'master' into 1.4
2 parents cbac853 + 946131a commit 3d89811

File tree

13 files changed

+191
-228
lines changed

13 files changed

+191
-228
lines changed

README.md

Lines changed: 57 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,17 @@ Creating a JDBC river is easy:
2424
Assuming you have a table of name `orders`, you can issue this simple command from the command line
2525

2626
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
27-
"max_bulk_actions" : 10000,
2827
"type" : "jdbc",
2928
"jdbc" : {
3029
"url" : "jdbc:mysql://localhost:3306/test",
3130
"user" : "",
3231
"password" : "",
33-
"fetchsize" : "min",
3432
"sql" : "select * from orders"
3533
}
3634
}'
3735

3836

39-
Note: the `max_bulk_actions` is set by default to 100 and have to be enlarged for most use cases, and
37+
Note: the `max_bulk_actions` parameter is set by default to 10000 and have to be enlarged for most use cases, and
4038
MySQL streaming mode is activated only by setting the row fetch size to Integer.MIN_VALUE, which can be
4139
achieved by using the string `"min"` for the parameter `fetchsize`.
4240

@@ -171,13 +169,11 @@ Internet access (of course)
171169

172170
```
173171
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
174-
"max_bulk_actions" : 10000,
175172
"type" : "jdbc",
176173
"jdbc" : {
177174
"url" : "jdbc:mysql://localhost:3306/test",
178175
"user" : "",
179176
"password" : "",
180-
"fetchsize" : "min",
181177
"sql" : "select * from orders"
182178
}
183179
}'
@@ -215,14 +211,12 @@ The general schema of a JDBC river instance declaration is
215211
Example:
216212
217213
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
218-
"max_bulk_actions" : 1000,
219214
"type" : "jdbc",
220215
"jdbc" : {
221216
"url" : "jdbc:mysql://localhost:3306/test",
222217
"user" : "",
223218
"password" : "",
224219
"sql" : "select * from orders",
225-
"fetchsize" : "min",
226220
"index" : "myindex",
227221
"type" : "mytype",
228222
...
@@ -267,9 +261,9 @@ Quartz cron expression format (see below).
267261
268262
`interval` - a time value for the delay between two river runs (default: not set)
269263
270-
`max_bulk_actions` - the length of each bulk index request submitted
264+
`max_bulk_actions` - the length of each bulk index request submitted (default: 10000)
271265
272-
`max_concurrrent_bulk_requests` - the maximum number of concurrent bulk requests
266+
`max_concurrrent_bulk_requests` - the maximum number of concurrent bulk requests (default: 2 * number of CPU cores)
273267
274268
`max_bulk_volume` - a byte size parameter for the maximum volume allowed for a bulk request (default: "10m")
275269
@@ -372,7 +366,7 @@ Quartz cron expression format (see below).
372366
"schedule" : null,
373367
"interval" : 0L,
374368
"threadpoolsize" : 4,
375-
"max_bulk_actions" : 100,
369+
"max_bulk_actions" : 10000,
376370
"max_concurrent_bulk_requests" : 2 * available CPU cores,
377371
"max_bulk_volume" : "10m",
378372
"max_request_wait" : "60s",
@@ -387,7 +381,7 @@ Quartz cron expression format (see below).
387381
"rounding" : null,
388382
"scale" : 2,
389383
"autocommit" : false,
390-
"fetchsize" : 10,
384+
"fetchsize" : 10, /* MySQL: Integer.MIN */
391385
"max_rows" : 0,
392386
"max_retries" : 3,
393387
"max_retries_wait" : "30s",
@@ -502,64 +496,61 @@ It is very important to note that overuse of overflowing ranges creates ranges t
502496
and no effort has been made to determine which interpretation CronExpression chooses.
503497
An example would be "0 0 14-6 ? * FRI-MON".
504498
505-
## How to start a JDBC feeder
506-
507-
In the `bin/feeder` directory, you find some feeder examples.
508-
509-
A feed can be started from the `$ES_HOME/plugins/jdbc` folder. If not already present, you should
510-
create a `bin` folder so it is easy to maintain feeder script side by side with the river.
499+
## How to run a standalone JDBC feeder
511500
512-
The feeder script must include the Elasticsearch core libraries into the classpath. Note the `-cp`
513-
parameter.
501+
A feeder can be started from a shell script. For this , the Elasticsearch home directory must be set in
502+
the environment variable ES_HOME. The JDBC plugin jar must be placed in the same directory of the script,
503+
together with JDBC river jar(s).
514504
515-
Here is an example of a feeder bash script in `$ES_HOME/plugins/jdbc/bin/feeder/oracle.create.sh`
505+
Here is an example of a feeder bash script:
516506
517507
#!/bin/sh
518508
519-
java="/usr/bin/java"
520-
#java="/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java"
521-
#java="/usr/java/jdk1.8.0/bin/java"
522-
509+
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
510+
511+
# ES_HOME required to detect elasticsearch jars
512+
export ES_HOME=~es/elasticsearch-1.4.0.Beta1
513+
523514
echo '
524515
{
525516
"elasticsearch" : {
526517
"cluster" : "elasticsearch",
527518
"host" : "localhost",
528519
"port" : 9300
529520
},
530-
"concurrency" : 1,
531521
"type" : "jdbc",
532522
"jdbc" : {
533-
"url" : "jdbc:oracle:thin:@//host:1521/sid",
534-
"user" : "user",
535-
"password" : "password",
536-
"sql" : "select or_id as \"_id\", or_tan as \"tan\" from orders",
537-
"index" : "myoracle",
538-
"type" : "myoracle",
539-
"index_settings" : {
540-
"index" : {
541-
"number_of_shards" : 1,
542-
"number_of_replica" : 0
543-
}
544-
}
545-
}
523+
"url" : "jdbc:mysql://localhost:3306/test",
524+
"user" : "",
525+
"password" : "",
526+
"sql" : "select *, page_id as _id from page",
527+
"treat_binary_as_string" : true,
528+
"index" : "metawiki"
529+
}
546530
}
547-
' | ${java} \
548-
-cp $(pwd):$(pwd)/\*:$(pwd)/../../lib/\* \
531+
' | java \
532+
-cp "${DIR}/*" \
549533
org.xbib.elasticsearch.plugin.jdbc.feeder.Runner \
550534
org.xbib.elasticsearch.plugin.jdbc.feeder.JDBCFeeder
551535
552-
The `jdbc` parameter structure is exactly the same as in a river.
536+
How does it work?
537+
538+
- first the shell script finds out about the directory where the script is placed, and it is placed into a variable `DIR`
553539
554-
The feeder is invoked by `JDBCFeeder` class and understands some more parameters. In this example,
555-
the default parameters are shown.
540+
- second, the location of the Elasticsearch home is exported in a shell variable `ES_HOME`
556541
557-
`elasticsearch` - an structure describing cluster, host, and port of a host of an Elasticsearch cluster.
542+
- the classpath must be set to `DIR/*` to detect the JDBC plugin jar in the same directory of the script
558543
559-
`concurrency` - how many `jdbc` jobs should be executed in parallel
544+
- the "Runner" class is able to expand the classpath over the Elasticsearch jars in `ES_HOME/lib` and looks also in `ES_HOME/plugins/jdbc`
560545
561-
In the example, you can also see that you can change your favorite `java` executable when
562-
executing a feed. You must use a Java JDK >= 1.7
546+
- the "Runner" class invokes the "JDBCFeeder", which reads a JSON file from stdin, which corresponds to a JDBC river definition
547+
548+
- the `elasticsearch` structure specifies the cluster, host, and port of a connection to an Elasticsearch cluster
549+
550+
The `jdbc` parameter structure in the definition is exactly the same as in a river.
551+
552+
It is possible to write an equivalent of this bash script for Windows.
553+
If you can send one to me for documentation on this page, I'd be very grateful.
563554
564555
## Structured objects
565556
@@ -812,56 +803,40 @@ in callable statement result sets.
812803
If there is more than one result sets returned by a callable statement,
813804
the JDBC plugin enters a loop and iterates through all result sets.
814805
815-
# Monitoring the JDBC river state
806+
# Monitoring the JDBC plugin state
816807
817808
While a river/feed is running, you can monitor the activity by using the `_state` command.
818809
819-
When running very large data fetches, it might be of interest to find out if the fetch is complete or still running.
810+
The `_state` command can show the state of a specific river or of all rivers,
811+
when an asterisk `*` is used as the river name.
820812
821-
The `_state` command can show the state of a specific river or of all rivers, when an asterisk `*` is used as the river name.
813+
The river state mechanism is specific to JDBC plugin implementation. It is part of the cluster metadata.
822814
823-
In the result, you can evaluate the field `active`. If set to `true`, the river is actively fetching data from the database.
815+
In the response, the field `started` will represent the time when the river/feeder was created.
816+
The field `last_active_begin` will represent the last time when a river/feeder run had begun, and
817+
the field `last_active_end` is null if th river/feeder runs, or will represent the last time the river/feeder
818+
has completed a run.
824819
825-
In the field `timestamp`, the latest state modification of the river is recorded.
820+
The `map` carries some flags for the river: `aborted`, `suspended`, and a `counter` for the number of
821+
invocations on this node.
826822
827823
Example:
828824
829825
curl 'localhost:9200/_river/jdbc/*/_state?pretty'
830826
{
831827
"state" : [ {
832-
"name" : "my_oracle_river",
828+
"name" : "feeder",
833829
"type" : "jdbc",
834-
"enabled" : true,
835-
"started" : "2014-05-10T20:29:04.260Z",
836-
"timestamp" : "2014-05-10T20:52:15.866Z",
837-
"counter" : 3,
838-
"active" : true,
839-
"custom" : {
840-
"rivername" : "feeder",
841-
"settings" : {
842-
"index" : "myoracle",
843-
"sql" : [ "select or_id as \"_id\", or_tan as \"tan\" from orders" ],
844-
"maxbulkactions" : 10,
845-
"type" : "myoracle",
846-
"password" : "...",
847-
"user" : "...",
848-
"url" : "jdbc:oracle:thin:@//localhost:1521/sid"
849-
},
850-
"locale" : "de_",
851-
"job" : null,
852-
"sql" : [ "statement=select or_id as \"_id\", or_tan as \"tan\" from orders parameter=[] callable=false" ],
853-
"autocommit" : false,
854-
"fetchsize" : 10,
855-
"maxrows" : 0,
856-
"retries" : 3,
857-
"maxretrywait" : "30s",
858-
"resultsetconcurrency" : "CONCUR_UPDATABLE",
859-
"resultsettype" : "TYPE_FORWARD_ONLY",
860-
"rounding" : 0,
861-
"scale" : 2
830+
"started" : "2014-10-18T13:38:14.436Z",
831+
"last_active_begin" : "2014-10-18T17:46:47.548Z",
832+
"last_active_end" : "2014-10-18T13:42:57.678Z",
833+
"map" : {
834+
"aborted" : false,
835+
"suspended" : false,
836+
"counter" : 6
862837
}
863838
} ]
864-
}
839+
}
865840
866841
867842
# Advanced topics

bin/feeder/mysql/create.sh

Lines changed: 0 additions & 38 deletions
This file was deleted.

bin/feeder/mysql/log4j.properties

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# for feeder
2+
3+
log4j.rootLogger=DEBUG, out
4+
5+
log4j.appender.out=org.apache.log4j.ConsoleAppender
6+
log4j.appender.out.layout=org.apache.log4j.PatternLayout
7+
log4j.appender.out.layout.ConversionPattern=[%d{ABSOLUTE}][%-5p][%-25c][%t] %m%n
8+

bin/feeder/mysql/simpleexample.bat

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
@echo off
2+
3+
SETLOCAL
4+
5+
if NOT DEFINED ES_HOME goto err
6+
7+
set DIR=%~dp0
8+
9+
set FEEDER_CLASSPATH=%DIR%/*
10+
11+
REM ???
12+
echo {^
13+
"elasticsearch" : {^
14+
"cluster" : "elasticsearch",^
15+
"host" : "localhost",^
16+
"port" : 9300^
17+
},^
18+
"type" : "jdbc",^
19+
"jdbc" : {^
20+
"url" : "jdbc:mysql://localhost:3306/test",^
21+
"user" : "",^
22+
"password" : "",^
23+
"sql" : "select *, page_id as _id from page",^
24+
"treat_binary_as_string" : true,^
25+
"index" : "metawiki"^
26+
}^
27+
}
28+
29+
"%JAVA_HOME%\bin\java" -cp "%FEEDER_CLASSPATH%" "org.xbib.elasticsearch.plugin.jdbc.feeder.Runner" "org.xbib.elasticsearch.plugin.jdbc.feeder.JDBCFeeder"
30+
goto finally
31+
32+
:err
33+
echo JAVA_HOME and ES_HOME environment variable must be set!
34+
pause
35+
36+
37+
:finally
38+
39+
ENDLOCAL

bin/feeder/mysql/simpleexample.sh

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#!/bin/sh
2+
3+
4+
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
5+
6+
# ES_HOME reguired to detect elasticsearch jars
7+
export ES_HOME=~es/elasticsearch-1.4.0.Beta1
8+
9+
echo '
10+
{
11+
"elasticsearch" : {
12+
"cluster" : "elasticsearch",
13+
"host" : "localhost",
14+
"port" : 9300
15+
},
16+
"type" : "jdbc",
17+
"jdbc" : {
18+
"url" : "jdbc:mysql://localhost:3306/test",
19+
"user" : "",
20+
"password" : "",
21+
"sql" : "select *, page_id as _id from page",
22+
"treat_binary_as_string" : true,
23+
"index" : "metawiki"
24+
}
25+
}
26+
' | java \
27+
-cp "${DIR}/*" \
28+
org.xbib.elasticsearch.plugin.jdbc.feeder.Runner \
29+
org.xbib.elasticsearch.plugin.jdbc.feeder.JDBCFeeder

0 commit comments

Comments
 (0)