Tuesday, September 11, 2018

How to access Amazon Simple Storage Service (Amazon S3) FS by passing command properties?

There is no need to set the core-site.xml to access Amazon S3, simple
the configuration as a java properties.

This is an article with instructions to access Amazon S3 by passing
parameters to `hadoop` command line. This is helpful to test access
before hardcoding the configuring parameter in the HDP cluster (which
will require a restart) or just creating scripts to do a task. To
provide Access Key and Secret Key, check out the examples:

Option 1 (Secure): Generate jceks and list and distcp files.

Generate the jce key store with the following access key:

hadoop credential create fs.s3a.access.key -value '<Access-key>'
-provider jceks:///tmp/aws.jceks

Generate the following secret key:

hadoop credential create fs.s3a.secret.key -value '<Secret-key>'
-provider jceks:///tmp/aws.jceks

Now, use the jce key store to list and distcp files:

hadoop fs
-Dhadoop.security.credential.provider.path=jceks:///tmp/dir/aws.jceks
-ls s3a://your-bucket/

hadoop fs distcp
Dhadoop.security.credential.provider.path=jceks:///tmp/dir/aws.jceks
/tmp/hello.txt s3a://your-bucket

Option 2 (Less Secure): To generate a key store and provide the username
and password as clear text using the java properties.

Use the following for Listing and distcp the access:

hadoop fs -Dfs.s3a.access.key=<;Access-key>
-Dfs.s3a.secret.key=<;Secret-key> -ls s3a://your-bucket/

hadoop distcp -Dfs.s3a.access.key=<;Access-key>
-Dfs.s3a.secret.key=<;Secret-key> /tmp/hello.txt s3a://your-bucket/

Extra Option: If the bucket is associated with a different endpoint, you
can overwrite the endpoint as a Java property. Add the following
properties to the command line to overwrite the endpoint.

-Dfs.s3a.endpoint=s3.us-east-2.amazonaws.com

Example:

hadoop fs -Dfs.s3a.endpoint=s3.us-east-2.amazonaws.com
-Dhadoop.security.credential.provider.path=jceks:///tmp/dir/aws.jceks
-ls s3a://your-bucket/