diff --git a/README.md b/README.md index 3a74393..81f381d 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,10 @@ This repo contains a `build.sh` script that's intended to be run in an Amazon Linux docker container, and build scikit-learn, numpy, and scipy for use in AWS -Lambda. For more info about how the script works, and how to use it, see my +Lambda with Python 3.6. For more info about how the script works, and how to use it, see my [blog post on deploying sklearn to Lambda](https://serverlesscode.com/post/scikitlearn-with-amazon-linux-container/). -There was an older version of this repo, now archived in the -[ec2-build-process](https://github.com/ryansb/sklearn-build-lambda/tree/ec2-build-process) -branch, used an EC2 instance to perform the build process and an Ansible -playbook to execute the build. That version still works, but the new dockerized -version doesn't require you to launch a remote instance. +Because Python 3.6 is not yet yum-installable, we use a Linux build of Python 3.6 available with [pyenv](https://github.com/pyenv/pyenv). To build the zipfile, pull the Amazon Linux image and run the build script in it. @@ -23,7 +19,7 @@ $ docker run -v $(pwd):/outputs -it amazonlinux:2016.09 \ ``` That will make a file called `venv.zip` in the local directory that's around -40MB. +45MB. Once you run this, you'll have a zipfile containing sklearn and its dependencies, to use them add your handler file to the zip, and add the `lib` @@ -52,13 +48,13 @@ def handler(event, context): ## Sizing and Future Work With just compression and stripped binaries, the full sklearn stack weighs in -at 39 MB, and could probably be reduced further by: +at 45 MB, and could probably be reduced further by: 1. Pre-compiling all .pyc files and deleting their source 1. Removing test files 1. Removing documentation -For my purposes, 39 MB is sufficiently small, if you have any improvements to +For my purposes, 45 MB is sufficiently small, if you have any improvements to share pull requests or issues are welcome. ## License diff --git a/build.sh b/build.sh index 5068ea5..2c0634e 100644 --- a/build.sh +++ b/build.sh @@ -9,24 +9,29 @@ yum install -y \ gcc \ gcc-c++ \ lapack-devel \ - python27-devel \ - python27-virtualenv \ findutils \ - zip + zip \ + zlib-devel \ + git \ + openssl \ + openssl-devel do_pip () { - pip install --upgrade pip wheel - pip install --use-wheel --no-binary numpy numpy - pip install --use-wheel --no-binary scipy scipy - pip install --use-wheel sklearn + /root/.pyenv/shims/python3.6 -m venv --copies /sklearn_build + source /sklearn_build/bin/activate + + pip3.6 install --upgrade pip wheel + pip3.6 install --use-wheel --no-binary numpy numpy + pip3.6 install --use-wheel --no-binary scipy scipy + pip3.6 install --use-wheel sklearn } strip_virtualenv () { echo "venv original size $(du -sh $VIRTUAL_ENV | cut -f1)" - find $VIRTUAL_ENV/lib64/python2.7/site-packages/ -name "*.so" | xargs strip + find $VIRTUAL_ENV/lib64/python3.6/site-packages/ -name "*.so" | xargs strip echo "venv stripped size $(du -sh $VIRTUAL_ENV | cut -f1)" - pushd $VIRTUAL_ENV/lib64/python2.7/site-packages/ && zip -r -9 -q /outputs/venv.zip * ; popd + pushd $VIRTUAL_ENV/lib64/python3.6/site-packages/ && zip -r -9 -q /outputs/venv.zip * ; popd echo "site-packages compressed size $(du -sh /outputs/venv.zip | cut -f1)" pushd $VIRTUAL_ENV && zip -r -q /outputs/full-venv.zip * ; popd @@ -34,24 +39,25 @@ strip_virtualenv () { } shared_libs () { - libdir="$VIRTUAL_ENV/lib64/python2.7/site-packages/lib/" - mkdir -p $VIRTUAL_ENV/lib64/python2.7/site-packages/lib || true + libdir="$VIRTUAL_ENV/lib64/python3.6/site-packages/lib/" + mkdir -p $VIRTUAL_ENV/lib64/python3.6/site-packages/lib || true cp /usr/lib64/atlas/* $libdir cp /usr/lib64/libquadmath.so.0 $libdir cp /usr/lib64/libgfortran.so.3 $libdir } -main () { - /usr/bin/virtualenv \ - --python /usr/bin/python /sklearn_build \ - --always-copy \ - --no-site-packages - source /sklearn_build/bin/activate +install_36 () { + git clone https://github.com/pyenv/pyenv.git ~/.pyenv + ~/.pyenv/bin/pyenv install 3.6.2 + ~/.pyenv/bin/pyenv global 3.6.2 + /root/.pyenv/shims/python3.6 --version + /root/.pyenv/shims/pip3.6 --version +} +main () { + install_36 do_pip - shared_libs - strip_virtualenv } main diff --git a/sample-site-packages-2016-02-20.zip b/sample-site-packages-2016-02-20.zip deleted file mode 100644 index 0547632..0000000 Binary files a/sample-site-packages-2016-02-20.zip and /dev/null differ