|
1 | | -#!/usr/bin/python |
2 | | -# -*- coding: utf-8 -*- |
| 1 | +"""Assignment - making a sklearn estimator and cv splitter. |
| 2 | +
|
| 3 | +The goal of this assignment is to implement by yourself: |
| 4 | +
|
| 5 | +- a scikit-learn estimator for the KNearestNeighbors for classification |
| 6 | + tasks and check that it is working properly. |
| 7 | +- a scikit-learn CV splitter where the splits are based on a Pandas |
| 8 | + DateTimeIndex. |
| 9 | +
|
| 10 | +Detailed instructions for question 1: |
| 11 | +The nearest neighbor classifier predicts for a point X_i the target y_k of |
| 12 | +the training sample X_k which is the closest to X_i. We measure proximity with |
| 13 | +the Euclidean distance. The model will be evaluated with the accuracy (average |
| 14 | +number of samples corectly classified). You need to implement the `fit`, |
| 15 | +`predict` and `score` methods for this class. The code you write should pass |
| 16 | +the test we implemented. You can run the tests by calling at the root of the |
| 17 | +repo `pytest test_sklearn_questions.py`. Note that to be fully valid, a |
| 18 | +scikit-learn estimator needs to check that the input given to `fit` and |
| 19 | +`predict` are correct using the `check_*` functions imported in the file. |
| 20 | +You can find more information on how they should be used in the following doc: |
| 21 | +https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator. |
| 22 | +Make sure to use them to pass `test_nearest_neighbor_check_estimator`. |
| 23 | +
|
| 24 | +
|
| 25 | +Detailed instructions for question 2: |
| 26 | +The data to split should contain the index or one column in |
| 27 | +datatime format. Then the aim is to split the data between train and test |
| 28 | +sets when for each pair of successive months, we learn on the first and |
| 29 | +predict of the following. For example if you have data distributed from |
| 30 | +november 2020 to march 2021, you have have 4 splits. The first split |
| 31 | +will allow to learn on november data and predict on december data, the |
| 32 | +second split to learn december and predict on january etc. |
| 33 | +
|
| 34 | +We also ask you to respect the pep8 convention: https://pep8.org. This will be |
| 35 | +enforced with `flake8`. You can check that there is no flake8 errors by |
| 36 | +calling `flake8` at the root of the repo. |
| 37 | +
|
| 38 | +Finally, you need to write docstrings for the methods you code and for the |
| 39 | +class. The docstring will be checked using `pydocstyle` that you can also |
| 40 | +call at the root of the repo. |
| 41 | +
|
| 42 | +Hints |
| 43 | +----- |
| 44 | +- You can use the function: |
| 45 | +
|
| 46 | +from sklearn.metrics.pairwise import pairwise_distances |
| 47 | +
|
| 48 | +to compute distances between 2 sets of samples. |
| 49 | +""" |
3 | 50 | import numpy as np |
4 | 51 | import pandas as pd |
5 | 52 |
|
|
0 commit comments