iRODS S3 API

A project that presents an iRODS Zone as S3 compatible storage.

Implements a subset of the Amazon S3 API:

https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operations.html

This API currently supports:

The goal is to support the equivalent of:

ils - aws s3 ls s3://bucketname/a/b/c/
iput - aws s3 cp localfile s3://bucketname/a/b/c/filename
iget - aws s3 cp s3://bucketname/a/b/c/filename localfile
irm - aws s3 rm s3://bucketname/a/b/c/filename
imv - aws s3 mv s3://bucketname/a/b/c/filename1 s3://bucketname/a/b/c/filename2

Limitations / What's Missing

Multipart

Multipart has not been implemented for copy operations where x-amz-copy-source and x-amz-copy-source-range are used. When performing a copy from one iRODS file to another, multipart should be disabled.

See Disabling Multipart for details.

Multipart uploads of a local file is supported.

Tagging

iRODS has its own metadata system, however it is not especially clear how it should map to S3 metadata, so it is not included at the moment.

Paging

Paging requires engineering work to provide paging through lists of objects efficiently, so right now this API does not attempt to paginate its output for things such as listobjects.

Checksum handling

Amazon S3 provides many ways to communicate checksums for the data as received by the server. iRODS provides MD5 checksums, however this API does not use that to verify data objects created through PutObject.

ETags

ETags are not provided for or used consistently.

Versioning

Versioning is not supported at this time.

Docker

This project provides two Dockerfiles, one for building and one for running the application.

IMPORTANT: All commands in the sections that follow assume you are located in the root of the repository.

The Builder Image

The builder image is responsible for building the iRODS S3 API package. Before you can use it, you must build the image. To do that, run the following:

docker build -t irods-s3-api-builder -f irods_builder.Dockerfile .

With the builder image in hand, all that's left is to compile the source code for the S3 API project. The builder image is designed to compile code sitting on your machine. This is important because it gives you the ability to build any fork or branch of the project.

Building the package requires mounting the project into the container at the appropriate location. The command you run should look similar to the one below. Don't forget to create the directory which will hold your package!

docker run -it --rm \
    -v /path/to/irods_client_s3_api:/s3_api_source:ro \
    -v /path/to/packages_directory:/packages_output \
    irods-s3-api-builder

If everything succeeds, you will have a DEB package in the local directory you mapped to /packages_output.

Using a build cache

In order to keep build artifacts around for faster iteration on build times, include another volume mount for /_build_s3_api into which build artifacts can be stored in the host filesystem:

docker run -it --rm \
    -v /path/to/irods_client_s3_api:/s3_api_source:ro \
    -v /path/to/build_directory:/_build_s3_api \
    -v /path/to/packages_directory:/packages_output \
    irods-s3-api-builder

The Runner Image

The runner image is responsible for running the iRODS S3 API. Building the runner image requires the DEB package for the iRODS S3 API to exist on the local machine. See the previous section for details on generating the package.

To build the image, run the following command:

docker build -t irods-s3-api-runner -f irods_runner.Dockerfile /path/to/packages/directory

If all goes well, you will have a containerized iRODS S3 API server! You can verify this by checking the version information. Below is an example.

$ docker run -it --rm irods-s3-api-runner -v
irods_s3_api <version>-<build_sha>

Launching the Container

To run the containerized server, you need to provide a configuration file at the correct location. If you do not have a configuration file already, see Configuration for details.

To launch the server, run the following command:

docker run -d --rm --name irods_s3_api \
    -v /path/to/config/file:/config.json:ro \
    -p 9000:9000 \
    irods-s3-api-runner

The first thing the server will do is validate the configuration. If the configuration fails validation, the server will exit immediately. If the configuration passes validation, then congratulations, you now have a working iRODS S3 API server!

You can view the log output using docker logs -f or by passing -it to docker run instead of -d.

If for some reason the default schema file is not sufficient, you can instruct the iRODS S3 API to use a different schema file. See the following example.

# Generate the default JSON schema.
docker run -it --rm irods-s3-api-runner --dump-default-jsonschema > schema.json

# Tweak the schema.
vim schema.json

# Launch the server with the new schema file.
docker run -d --rm --name irods_s3_api \
    -v /path/to/config/file:/config.json:ro \
    -v ./schema.json:/jsonschema.json:ro \
    -p 9000:9000 \
    irods-s3-api-runner \
    --jsonschema-file /jsonschema.json

Stopping the Container

If the container was launched with -it, use CTRL-C or docker container stop <container_name> to shut it down.

If the container was launched with -d, use docker container stop <container_name>.

Building and running without Docker

Build Dependencies

iRODS development package
iRODS externals package for boost
iRODS externals package for nlohmann-json
iRODS externals package for spdlog
Curl development package
OpenSSL development package

Building from source

This project relies on git submodules and Docker for building the server.

Before the server can be built, you must download the appropriate git submodules. You can do that by running the following:

git submodule update --init --recursive

To build, follow the normal CMake steps.

mkdir build # Preferably outside of the repository
cd build
cmake /path/to/repository
make package # Use -j to use more parallelism.

Upon success, you should have an installable package.

If you run into issues, try checking if the git submodules exist on your machine.

Running without Docker

In order to run the iRODS S3 API server, you need a valid configuration file. See Configuration for details on how to create one.

Once the configuration file is prepared, run the following to launch the server:

irods_s3_api /path/to/config.json

To stop the server, you can use CTRL-C or send SIGINT or SIGTERM to the process.

Configuration

Before you can run the server, you'll need to create a configuration file.

You can generate a configuration file by running the following:

irods_s3_api --dump-config-template > config.json

IMPORTANT: --dump-config-template does not produce a fully working configuration. It must be updated before it can be used.

Configuration File Structure

The JSON structure below represents the default configuration.

Notice how some of the configuration values are wrapped in angle brackets (e.g. "<string>"). These are placeholder values that must be updated before launch.

IMPORTANT: The comments in the JSON structure are there for explanatory purposes and must not be included in your configuration. Failing to follow this requirement will result in the server failing to start up.

{
    // Defines options that affect how the client-facing component of the
    // server behaves.
    "s3_server": {
        // The hostname or IP address to bind.
        // "0.0.0.0" instructs the server to listen on all network interfaces.
        "host": "0.0.0.0",

        // The port used to accept incoming client requests.
        "port": 9000,

        // (Optional)
        // The minimum log level needed before logging activity.
        //
        // The following values are supported:
        // - trace
        // - debug
        // - info
        // - warn
        // - error
        // - critical
        "log_level": "info",

        // Defines the set of plugins to load.
        "plugins": {
            //
            // Each key corresponds to a plugin's .so file name, minus the
            // "lib" prefix.
            //

            "static_bucket_resolver": {
                // The internal name assigned to the plugin.
                "name": "static_bucket_resolver",

                // Defines the mapping between bucket names and iRODS
                // collections.
                "mappings": {
                    "<bucket_name>": "/path/to/collection"
                }
            },

            "static_authentication_resolver": {
                // The internal name assigned to the plugin.
                "name": "static_authentication_resolver",

                // Defines information for resolving an S3 username to an
                // iRODS username.
                "users": {
                    // Maps <s3_username> to a specific iRODS user.
                    // Each iRODS user that intends to access the S3 API must
                    // have at least one entry.
                    "<s3_username>": {
                        // The iRODS username to resolve to.
                        "username": "<string>",

                        // The secret key used to authenticate with the S3
                        // API for this user.
                        "secret_key": "<string>"
                    }
                }
            }
        },

        // Defines the region the server will report as being a member of.
        "region": "us-east-1",

        // Defines the location where part files are temporarily stored
        // on the irods_s3_api server before being streamed to iRODS. 
        "multipart_upload_part_files_directory": "/tmp",

        // Defines options that affect various authentication schemes.
        "authentication": {
            // The amount of time that must pass before checking for expired
            // bearer tokens.
            "eviction_check_interval_in_seconds": 60,

            // Defines options for the "Basic" authentication scheme.
            "basic": {
                // The amount of time before a user's authentication
                // token expires.
                "timeout_in_seconds": 3600
            }
        },

        // Defines options that affect how client requests are handled.
        "requests": {
            // The number of threads dedicated to servicing client requests.
            // When adjusting this value, consider adjusting "background_io/threads"
            // and "irods_client/connection_pool/size" as well.
            "threads": 3,

            // The maximum size allowed for the body of a request.
            "max_size_of_request_body_in_bytes": 8388608,

            // The amount of time allowed to service a request. If the timeout
            // is exceeded, the client's connection is terminated immediately.
            "timeout_in_seconds": 30
        },

        // Defines options that affect tasks running in the background.
        // These options are primarily related to long-running tasks.
        "background_io": {
            // The number of threads dedicated to background I/O.
            "threads": 6
        }
    },

    // Defines iRODS connection information.
    "irods_client": {
        // The hostname or IP of the target iRODS server.
        "host": "<string>",

        // The port of the target iRODS server.
        "port": 1247,

        // The zone of the target iRODS server.
        "zone": "<string>",

        // Defines options for secure communication with the target iRODS server.
        "tls": {
            // Controls whether the client and server communicate using TLS.
            //
            // The following values are supported:
            // - CS_NEG_REFUSE:    Do not use secure communication.
            // - CS_NEG_REQUIRE:   Demand secure communication.
            // - CS_NEG_DONT_CARE: Let the server decide.
            "client_server_policy": "CS_NEG_REFUSE",

            // The file containing trusted CA certificates in PEM format.
            //
            // Note that the certificates in this file are used in conjunction
            // with the system default trusted certificates.
            "ca_certificate_file": "<string>",

            // The file containing the server's certificate chain.
            //
            // The certificates must be in PEM format and must be sorted
            // starting with the subject's certificate (actual client or server
            // certificate), followed by intermediate CA certificates if
            // applicable, and ending at the highest level (root) CA.
            "certificate_chain_file": "<string>",

            // The file containing Diffie-Hellman parameters.
            "dh_params_file": "<string>",

            // Defines the level of server certificate authentication to
            // perform.
            //
            // The following values are supported:
            // - none:     Authentication is skipped.
            // - cert:     The server verifies the certificate is signed by
            //             a trusted CA.
            // - hostname: Equivalent to "cert", but also verifies the FQDN
            //             of the iRODS server matches either the common
            //             name or one of the subjectAltNames.
            "verify_server": "cert"
        },

        // Controls how the S3 API communicates with the iRODS server.
        //
        // When set to true, the following applies:
        // - Only APIs supported by the iRODS 4.2 series will be used.
        // - Connection pool settings are ignored.
        // - All HTTP requests will be served using a new iRODS connection.
        //
        // When set to false, the S3 API will take full advantage of the
        // iRODS server's capabilities.
        //
        // This option should be used when the S3 API is configured to
        // communicate with an iRODS 4.2 server.
        "enable_4_2_compatibility": false,

        // The credentials for the rodsadmin user that will act as a proxy
        // for all authenticated users.
        "proxy_admin_account": {
            "username": "<string>",
            "password": "<string>"
        },

        // Defines options for the connection pool.
        "connection_pool": {
            // The number of connections in the pool.
            "size": 6,

            // (Optional)
            // The amount of time that must pass before a connection is
            // renewed (i.e. replaced).
            "refresh_timeout_in_seconds": 600,

            // (Optional)
            // The number of times a connection can be fetched from the pool
            // before it is refreshed.
            "max_retrievals_before_refresh": 16,

            // (Optional)
            // Instructs the connection pool to track changes in resources.
            // If a change is detected, all connections will be refreshed.
            "refresh_when_resource_changes_detected": true
        },

        // The resource to target for all write operations.
        "resource": "<string>",

        // The buffer size used to read objects from the client
        // and write to iRODS.
        "put_object_buffer_size_in_bytes": 8192,

        // The buffer size used to read objects from iRODS
        // and send to the client.
        "get_object_buffer_size_in_bytes": 8192
    }
}

Connecting with Botocore

As a simple example, this is how you pass that in through botocore, a library from Amazon that provides S3 connectivity.

import botocore.session

session = botocore.session.get_session()
client = session.create_client("s3",
                               use_ssl=False,
                               endpoint_url="http://127.0.0.1:8080",
                               aws_access_key_id="<username>",
                               aws_secret_access_key="<secret key>")

Disabling Multipart

Multipart copies are not supported at this time. Therefore, multipart must be disabled in the client.

Disabling Multipart for AWS CLI

For AWS CLI, multipart can be disabled by setting an arbitrarily large multipart threshold. Since 5 GB is the largest single part allowed by AWS, this is a good choice.

To disable multipart, set the multipart_threshold in the ~/.aws/credentials file for the profile in question. For example, you could create a profile called irods_s3_no_multipart with the following in the credentials file.

[irods_s3_no_multipart]
aws_access_key_id = key1 
aws_secret_access_key = secret_key1
s3 =
    multipart_threshold = 5GB

To use this with the AWS CLI commands, use the --profile flag. Example: aws --profile irods_s3_no_multipart.

Example for Boto3

To set the multipart threshold with a boto3 client, do the following:

config = TransferConfig(multipart_threshold=5*1024*1024*1024)
self.boto3_client.upload_file(put_filename, bucket_name, key, Config=config)

Example of MinIO mc client

The mc cp command has a --disable-multipart option. Here is an example of a copy with a myminio alias:

mc cp --disable-multipart put_file myminio/bucket_name/put_filename

Note: MinIO client uses aliases to group URL and keys. Refer to the mc alias command for information on setting, listing, and removing aliases.

Running Tests

It is recommended to cd to the tests/docker directory in this repository when running the docker compose commands below. Otherwise, you must specify the Compose project directory with the --project-directory option.

Run the following to run the full test suite (assumes your current working directory is the root directory of this repository):

cd tests/docker
docker compose build
docker compose run --rm test-runner

The test output will appear in the terminal. Once the tests complete run the following to cleanup:

docker compose down
docker volume prune # Use -f to skip the confirmation prompt

Note: If you get an error like 'name' does not match any of the regexes: '^x-' then you will need to upgrade your version of docker compose.

Running specific tests

To run one or more specific tests by name, override the command for the test-runner service container. The unittest command line interface (CLI) is run in the entrypoint script for the service image and the command is passed as arguments to the script.

The most straightforward way to override the command is to specify it as an argument after the service name with docker compose run. Here is an example for how to run a specific test:

docker compose run --rm test-runner listbuckets_test.ListBuckets_Test.test_aws_list_bucket

You can also specify a list of tests and test modules:

docker compose run --rm test-runner listbuckets_test.ListBuckets_Test.test_aws_list_bucket abortmultipartupload_test

And use any of the supported options for the unittest CLI:

docker compose run --rm test-runner -k aws -b listbuckets_test

For more information about docker compose run, see the Docker Compose documentation. For more information about the unittest CLI, see the Python documentation.

The command can also be overridden by modifying the command attribute of the test-runner service definition stanza in the Docker Compose file. Add the following line to the stanza:

# compose.yml

# ...

    test-runner:
        # ...

        # List whatever tests and options are needed just like the docker compose run invocations above.
        command: listbuckets_test.ListBuckets_Test.test_aws_list_bucket abortmultipartupload_test

        # ...

With this method, the command will be overridden by default and does not need to be specified when running docker compose run. So, with the configuration for the test-runner service shown above, the following commands are equivalent:

docker compose run --rm test-runner listbuckets_test.ListBuckets_Test.test_aws_list_bucket abortmultipartupload_test

docker compose run --rm test-runner

For more information about the Docker Compose command attribute, see the Docker Compose documentation.

`irods-s3-api` service build oddities

The topics discussed in this section will be addressed in #159 and at that point we can remove this.

The irods-s3-api service is built from the irods_runner.Dockerfile found in the root directory of this repository. The Dockerfile requires that a DEB package file for the S3 API exist in the build context. This means that the DEB package file must exist in the root directory of this repository.

This can also be accomplished when building S3 API packages using The Builder Image by setting the packages_output mountpoint to the root directory of this repository thereby outputting the built packages in the expected location. Here is an example of how to do that (this assumes that the Builder image has already been built):

docker run -it --rm \
    -v .:/s3_api_source:ro \
    -v .:/packages_output \
    irods-s3-api-builder

See The Runner Image for more details about building the S3 API Docker image.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
core		core
endpoints		endpoints
tests		tests
third-party		third-party
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build_packages.sh		build_packages.sh
irods_builder.Dockerfile		irods_builder.Dockerfile
irods_runner.Dockerfile		irods_runner.Dockerfile
s3_api_diagram-draw.io.xml		s3_api_diagram-draw.io.xml
s3_api_diagram.png		s3_api_diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

iRODS S3 API

Limitations / What's Missing

Multipart

Tagging

Paging

Checksum handling

ETags

Versioning

Docker

The Builder Image

Using a build cache

The Runner Image

Launching the Container

Stopping the Container

Building and running without Docker

Build Dependencies

Building from source

Running without Docker

Configuration

Configuration File Structure

Connecting with Botocore

Disabling Multipart

Disabling Multipart for AWS CLI

Example for Boto3

Example of MinIO mc client

Running Tests

Running specific tests

`irods-s3-api` service build oddities

About

Uh oh!

Releases 4

Packages

Contributors 6

Uh oh!

Languages

License

irods/irods_client_s3_api

Folders and files

Latest commit

History

Repository files navigation

iRODS S3 API

Limitations / What's Missing

Multipart

Tagging

Paging

Checksum handling

ETags

Versioning

Docker

The Builder Image

Using a build cache

The Runner Image

Launching the Container

Stopping the Container

Building and running without Docker

Build Dependencies

Building from source

Running without Docker

Configuration

Configuration File Structure

Connecting with Botocore

Disabling Multipart

Disabling Multipart for AWS CLI

Example for Boto3

Example of MinIO mc client

Running Tests

Running specific tests

irods-s3-api service build oddities

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 6

Uh oh!

Languages

`irods-s3-api` service build oddities

Packages