⚠️ Warning: As of late 2023, the ORES infrastructure is being deprecated by the WMF Machine Learning team, please check https://wikitech.wikimedia.org/wiki/ORES for more info.While the code in this repository may still work, it is unmaintained, and as such may break at any time. Special consideration should also be given to machine learning models seeing drift in quality of predictions.
The replacement for ORES and associated infrastructure is Lift Wing: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing
Some Revscoring models from ORES run on the Lift Wing infrastructure, but they are otherwise unsupported (no new training or code updates).
They can be downloaded from the links documented at: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Revscoring_models_(migrated_from_ORES)
In the long term, some or all these models may be replaced by newer models specifically tailored to be run on modern ML infrastructure like Lift Wing.
If you have any questions, contact the WMF Machine Learning team: https://wikitech.wikimedia.org/wiki/Machine_Learning
A generic, machine learning-based revision scoring system designed to help automate critical wiki-work — for example, vandalism detection and removal. This library powers ORES.
Using a scorer_model to score a revision::
  import mwapi
  from revscoring import Model
  from revscoring.extractors.api.extractor import Extractor
  with open("models/enwiki.damaging.linear_svc.model") as f:
       scorer_model = Model.load(f)
  extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
                                          user_agent="revscoring demo"))
  feature_values = list(extractor.extract(123456789, scorer_model.features))
  print(scorer_model.score(feature_values))
  {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}The easiest way to install is via the Python package installer (pip).
pip install revscoring
You may find that some of the dependencies fail to compile (namely
scipy, numpy and sklearn).  In that case, you'll need to install some
dependencies in your operating system.
- Run 
sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev enchant - Run 
sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi 
Using Homebrew and pip, installing revscoring and enchant can be accomplished
as follows::
brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoringcd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make installCaveats: 
 The differences between the aspell and myspell dictionaries can cause 
 some of the tests to fail 
Finally, in order to make use of language features, you'll need to download some NLTK data. The following command will get the necessary corpora.
python -m nltk.downloader omw sentiwordnet stopwords wordnet
You'll also need to install enchant-compatible dictionaries of the languages you'd like to use. We recommend the following:
- languages.arabic: aspell-ar
 - languages.basque: hunspell-eu
 - languages.bengali: aspell-bn
 - languages.bosnian: hunspell-bs
 - languages.catalan: myspell-ca
 - languages.czech: myspell-cs
 - languages.croatian: myspell-hr
 - languages.dutch: myspell-nl
 - languages.english: myspell-en-us myspell-en-gb myspell-en-au
 - languages.estonian: myspell-et
 - languages.finnish: voikko-fi
 - languages.french: myspell-fr
 - languages.galician: hunspell-gl
 - languages.german: myspell-de-at myspell-de-ch myspell-de-de
 - languages.greek: aspell-el
 - languages.hebrew: myspell-he
 - languages.hindi: aspell-hi
 - languages.hungarian: myspell-hu
 - languages.icelandic: aspell-is
 - languages.indonesian: aspell-id
 - languages.italian: myspell-it
 - languages.latvian: myspell-lv
 - languages.norwegian: myspell-nb
 - languages.persian: myspell-fa
 - languages.polish: aspell-pl
 - languages.portuguese: myspell-pt-pt myspell-pt-br
 - languages.serbian: hunspell-sr
 - languages.spanish: myspell-es
 - languages.swedish: aspell-sv
 - languages.tamil: aspell-ta
 - languages.russian: myspell-ru
 - languages.ukrainian: aspell-uk
 - languages.vietnamese: hunspell-vi
 
To contribute, ensure to install the dependencies:
$ pip install -r requirements.txtInstall necessary NLTK data:
python -m nltk.downloader omw sentiwordnet stopwords wordnet
Make sure you install test dependencies:
$ pip install -r test-requirements.txtThen run:
$ pytest . -vvTo report a bug, please use Phabricator