|
1 | | -# Wikidata-Toolkit-Examples |
| 1 | +# Wikidata Toolkit Examples |
2 | 2 |
|
3 | | -This repository contains example programs that show some of the features |
4 | | -of Wikidata Toolkit. |
| 3 | +This is an example project that shows how to set up a Java project that |
| 4 | +uses [Wikidata Toolkit](https://github.com/Wikidata/Wikidata-Toolkit). |
| 5 | +It contains several simple example programs and bots in the source directory. |
5 | 6 |
|
6 | | -Overview and Settings |
7 | | ---------------------- |
8 | | - |
9 | | -A detailed guide to each of the examples is given below. Many examples process data |
10 | | -dumps exported by Wikidata. In most cases, the example only contains the actual |
11 | | -processing code that does something interesting. The code for downloading dumps and |
12 | | -iterating over them is in the ExampleHelpers.java class, which is used in many examples |
13 | | -for common tasks. |
14 | | - |
15 | | -You can edit the static members in ExampleHelpers to select which dumps should be |
16 | | -used (the data is available in several formats which may be more or less recent |
17 | | -and more or less comprehensive). You can also switch to offline mode there: then |
18 | | -only the files downloaded previously will be used. This is convenient for testing |
19 | | -to avoid downloading new files when you don't really need absolutely current data. |
20 | | -By default, the code will fetch the most recent JSON dumps from the Web. |
21 | | - |
22 | | -Some examples write their output to files. These files are put into the subdirectory |
23 | | -"results" under the directory from where the application is run. Files in CSV |
24 | | -format can be loaded in any spreadsheet tool to make diagrams, for example. |
25 | | - |
26 | | -Guide to the Available Examples |
| 7 | +What's found in this repository |
27 | 8 | ------------------------------- |
28 | 9 |
|
29 | | -Ordered roughly from basic to advanced/specific. |
30 | | - |
31 | | -#### EntityStatisticsProcessor.java #### |
32 | | - |
33 | | -This program processes entities (items and properties) to collect some basic |
34 | | -statistics. It counts how many items and properties there are, the number of labels, |
35 | | -descriptions, and aliases, and the number of statements. This code might be useful |
36 | | -to get to know the basic data structures where these things are stored. The example |
37 | | -also counts the usage of each property in more details: its use in the main part |
38 | | -of statements, in qualifiers, and in references is counted separately. The results |
39 | | -for this are written into a CSV file in the end. |
40 | | - |
41 | | -#### FetchOnlineDataExample.java #### |
42 | | - |
43 | | -This program shows how to fetch live data from wikidata.org via the Web API. This can |
44 | | -be used with any other Wikibase site as well. It is not practical to fetch all data |
45 | | -in this way, but it can be very convenient to get some data directly even when processing |
46 | | -a dump (since the dump can only be read in sequence). |
47 | | - |
48 | | -#### EditOnlineDataExample.java #### |
49 | | - |
50 | | -This program shows how to create and modify live data on test.wikidata.org via the Web API. |
51 | | -This can be used with any other Wikibase site as well. The example first creates a new item |
52 | | -with some starting data, then adds some additional statements, and finally modifies and |
53 | | -deletes existing statements. All data modifications automatically use revision ids to make |
54 | | -sure that no edit conflicts occur (and we don't modify/delete data that is different from |
55 | | -what we expect). |
56 | | - |
57 | | -#### LocalDumpFileExample.java #### |
58 | | - |
59 | | -This program shows how to process a data dump that is available in a local file, rather |
60 | | -than being automatically downloaded (and possibly cached) from the Wikimedia site. |
61 | | - |
62 | | -#### GreatestNumberProcessor.java #### |
63 | | - |
64 | | -This simple program looks at all values of a number property to find the item with the |
65 | | -greatest value. It will print the result to the console. In most cases, the item with |
66 | | -the greatest number is fairly early in the data export, so watching the program work is |
67 | | -not too exciting, but it shows how to read a single property value to do something with |
68 | | -it. The property that is used is defined by a constant in the code and can be changed to |
69 | | -see some other greatest values. |
70 | | - |
71 | | -#### LifeExpectancyProcessor.java #### |
72 | | - |
73 | | -This program processes items to compute the average life expectancy of people on |
74 | | -Wikidata. It shows how to get details (here: year numbers) of specific statement values |
75 | | -for specific properties (here we use Wikidata's P569 "birth date" and P570 "death date"). |
76 | | -The results are stored in a CSV file that shows average life expectancy by year of |
77 | | -birth. The overall average is also printed to the output. |
78 | | - |
79 | | -#### WorldMapProcessor.java #### |
80 | | - |
81 | | -This program generates images of world maps based on the locations of Wikidata items, |
82 | | -and stores the result in PNG files. The example builds several maps, for Wikidata as |
83 | | -a whole and for several big Wikipedias (counting only items with an article in there). |
84 | | -The code offers easy-to-adjust parameters for the size of the output images, the |
85 | | -Wikimedia projects to consider, and the scale of the color values. |
86 | | - |
87 | | -[Wikidata world maps for June 2015](https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en) |
88 | | - |
89 | | -#### GenderRatioProcessor.java #### |
90 | | - |
91 | | -This program uses Wikidata to analyse the number of articles that exist on certain |
92 | | -topics in different Wikimedia projects (esp. in Wikipedias). In particular, it counts |
93 | | -the number of articles about humans and humans of a specific gender (female, male, etc.). |
94 | | -Can be used to estimate the gender balance of various Wikipedias. The results are stored |
95 | | -in a CSV file (all projects x all genders), but for the largest projects they are also |
96 | | -printed to the output. This example is inspired by Max Klein's work on this topic. |
97 | | - |
98 | | -[Related blog post by Max Klein](http://notconfusing.com/sex-ratios-in-wikidata-part-iii/) |
99 | | - |
100 | | -#### JsonSerializationProcessor.java #### |
101 | | - |
102 | | -This program creates a JSON file that contains English language terms, birthdate, occupation, |
103 | | -and image for all people on Wikidata who were born in Dresden (the code can easily be |
104 | | -modified to make a different selection). The example shows how to serialize Wikidata Toolkit |
105 | | -objects in JSON, how to select item documents by a property, and how to filter documents to |
106 | | -ignore some of the data. The resulting file is small (less than 1M). |
107 | | - |
108 | | -#### SitelinksExample.java #### |
| 10 | +The individual examples are documented in the README file of each package. |
109 | 11 |
|
110 | | -This program shows how to get information about the site links that are used in Wikidata |
111 | | -dumps. The links to Wikimedia projects use keys like "enwiki" for English Wikipedia or |
112 | | -"hewikivoyage" for Hebrew WikiVoyage. To find out the meaning of these codes, and to |
113 | | -create URLs for the articles on these projects, Wikidata Toolkit includes some simple |
114 | | -functions that download and process the site links information for a given project. |
115 | | -This example shows how to use this functionality. |
116 | 12 |
|
117 | | -#### ClassPropertyUsageExample.java #### |
| 13 | +Running examples using an IDE |
| 14 | +----------------------------- |
118 | 15 |
|
119 | | -This advanced program analyses the use of properties and classes on Wikidata, and creates |
120 | | -output that can be used in the [Miga data browser](http://migadv.com/). You can see the |
121 | | -result online at http://tools.wmflabs.org/wikidata-exports/miga/. The program is slightly |
122 | | -more complex, involving several processing steps and additional code for formatting output |
123 | | -for CSV files. |
| 16 | +You can import the project into any Java IDE that supports Maven (and maybe git) |
| 17 | +and run the example programs from there. Wikidata Toolkit provides detailed |
| 18 | +[instructions on how to set up Eclipse for using Maven and git](https://www.mediawiki.org/wiki/Wikidata_Toolkit/Eclipse_setup). |
124 | 19 |
|
125 | | -#### RdfSerializationExample.java #### |
126 | 20 |
|
127 | | -This program creates an RDF export. You can also do this directly using the command line |
128 | | -client. The purpose of this program is just to show how this could be done in code, e.g., |
129 | | -to implement additional pre-processing before the RDF serialisation. |
| 21 | +Running examples directly using Maven |
| 22 | +------------------------------------- |
130 | 23 |
|
| 24 | +You can also run the code directly using Maven from the command line. For this, |
| 25 | +you need to have Maven and (obviously) Java installed. To compile the project |
| 26 | +and obtain necessary dependencies, run |
131 | 27 |
|
132 | | -Other Helper Code |
133 | | ------------------ |
| 28 | +```mvn compile``` |
134 | 29 |
|
135 | | -#### ExampleHelpers.java #### |
| 30 | +Thereafter, you can run any individual example using its Java class name, for |
| 31 | +example: |
136 | 32 |
|
137 | | -This class provides static helper methods to iterate through dumps, to configure the |
138 | | -desired logging behaviour, and to write files to the "results" directory. It also allows |
139 | | -you to change some global settings that will affect most examples. The code is of interest |
140 | | -if you want to find out how to build a standalone application that includes all aspects |
141 | | -without relying on the example module. |
| 33 | +```mvn exec:java -Dexec.mainClass="examples.FetchOnlineDataExample"``` |
142 | 34 |
|
143 | | -#### EntityTimerProcessor.java #### |
| 35 | +Credits and License |
| 36 | +------------------- |
144 | 37 |
|
145 | | -This is a helper class that is used in all examples to print basic timer information and |
146 | | -to provide support for having a timeout (cleanly abort processing after a fixed time, even |
147 | | -if the dump would take much longer to complete; useful for testing). It should not be of |
148 | | -primary interest for learning how to use Wikidata Toolkit, but you can have a look to find |
149 | | -out how to use our Timer class. |
| 38 | +This project is copied from the [Wikidata Toolkit](https://github.com/Wikidata/Wikidata-Toolkit) examples module. |
| 39 | +Authors can be found there. |
150 | 40 |
|
151 | | -Additional Resources |
152 | | --------------------- |
| 41 | +License: [Apache 2.0](LICENSE) |
153 | 42 |
|
154 | | -* [Wikidata Toolkit homepage](https://www.mediawiki.org/wiki/Wikidata_Toolkit) |
155 | | -* [Wikidata Toolkit Javadocs](http://wikidata.github.io/Wikidata-Toolkit/) |
0 commit comments