Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,10 @@

This repo contains a `build.sh` script that's intended to be run in an Amazon
Linux docker container, and build scikit-learn, numpy, and scipy for use in AWS
Lambda. For more info about how the script works, and how to use it, see my
Lambda with Python 3.6. For more info about how the script works, and how to use it, see my
[blog post on deploying sklearn to Lambda](https://serverlesscode.com/post/scikitlearn-with-amazon-linux-container/).

There was an older version of this repo, now archived in the
[ec2-build-process](https://github.com/ryansb/sklearn-build-lambda/tree/ec2-build-process)
branch, used an EC2 instance to perform the build process and an Ansible
playbook to execute the build. That version still works, but the new dockerized
version doesn't require you to launch a remote instance.
Because Python 3.6 is not yet yum-installable, we use a Linux build of Python 3.6 available with [pyenv](https://github.com/pyenv/pyenv).

To build the zipfile, pull the Amazon Linux image and run the build script in
it.
Expand All @@ -23,7 +19,7 @@ $ docker run -v $(pwd):/outputs -it amazonlinux:2016.09 \
```

That will make a file called `venv.zip` in the local directory that's around
40MB.
45MB.

Once you run this, you'll have a zipfile containing sklearn and its
dependencies, to use them add your handler file to the zip, and add the `lib`
Expand Down Expand Up @@ -52,13 +48,13 @@ def handler(event, context):
## Sizing and Future Work

With just compression and stripped binaries, the full sklearn stack weighs in
at 39 MB, and could probably be reduced further by:
at 45 MB, and could probably be reduced further by:

1. Pre-compiling all .pyc files and deleting their source
1. Removing test files
1. Removing documentation

For my purposes, 39 MB is sufficiently small, if you have any improvements to
For my purposes, 45 MB is sufficiently small, if you have any improvements to
share pull requests or issues are welcome.

## License
Expand Down
44 changes: 25 additions & 19 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,49 +9,55 @@ yum install -y \
gcc \
gcc-c++ \
lapack-devel \
python27-devel \
python27-virtualenv \
findutils \
zip
zip \
zlib-devel \
git \
openssl \
openssl-devel

do_pip () {
pip install --upgrade pip wheel
pip install --use-wheel --no-binary numpy numpy
pip install --use-wheel --no-binary scipy scipy
pip install --use-wheel sklearn
/root/.pyenv/shims/python3.6 -m venv --copies /sklearn_build
source /sklearn_build/bin/activate

pip3.6 install --upgrade pip wheel
pip3.6 install --use-wheel --no-binary numpy numpy
pip3.6 install --use-wheel --no-binary scipy scipy
pip3.6 install --use-wheel sklearn
}

strip_virtualenv () {
echo "venv original size $(du -sh $VIRTUAL_ENV | cut -f1)"
find $VIRTUAL_ENV/lib64/python2.7/site-packages/ -name "*.so" | xargs strip
find $VIRTUAL_ENV/lib64/python3.6/site-packages/ -name "*.so" | xargs strip
echo "venv stripped size $(du -sh $VIRTUAL_ENV | cut -f1)"

pushd $VIRTUAL_ENV/lib64/python2.7/site-packages/ && zip -r -9 -q /outputs/venv.zip * ; popd
pushd $VIRTUAL_ENV/lib64/python3.6/site-packages/ && zip -r -9 -q /outputs/venv.zip * ; popd
echo "site-packages compressed size $(du -sh /outputs/venv.zip | cut -f1)"

pushd $VIRTUAL_ENV && zip -r -q /outputs/full-venv.zip * ; popd
echo "venv compressed size $(du -sh /outputs/full-venv.zip | cut -f1)"
}

shared_libs () {
libdir="$VIRTUAL_ENV/lib64/python2.7/site-packages/lib/"
mkdir -p $VIRTUAL_ENV/lib64/python2.7/site-packages/lib || true
libdir="$VIRTUAL_ENV/lib64/python3.6/site-packages/lib/"
mkdir -p $VIRTUAL_ENV/lib64/python3.6/site-packages/lib || true
cp /usr/lib64/atlas/* $libdir
cp /usr/lib64/libquadmath.so.0 $libdir
cp /usr/lib64/libgfortran.so.3 $libdir
}

main () {
/usr/bin/virtualenv \
--python /usr/bin/python /sklearn_build \
--always-copy \
--no-site-packages
source /sklearn_build/bin/activate
install_36 () {
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
~/.pyenv/bin/pyenv install 3.6.2
~/.pyenv/bin/pyenv global 3.6.2
/root/.pyenv/shims/python3.6 --version
/root/.pyenv/shims/pip3.6 --version
}

main () {
install_36
do_pip

shared_libs

strip_virtualenv
}
main
Binary file removed sample-site-packages-2016-02-20.zip
Binary file not shown.