Sunday, May 6, 2007

Amazon Simple Storage Service [S3]

What is S3??

Simple Storage Service: Is a web service that allows any developer to gain access to highly scalable, very reliable, inexpensive storage space. Your data is replicated to multiple servers at multiple data centers.

How to get started

Go to Amazon’s AWS page, then to the S3 page and sign up (check out the other services while you are there)

Pricing (From Amazon’s site…)

Pricing

New Pricing (effective June 1st, 2007)

Storage
$0.15 per GB-Month of storage used

Data Transfer
$0.10 per GB - all data uploaded

$0.18 per GB - first 10 TB / month data downloaded
$0.16 per GB - next 40 TB / month data downloaded
$0.13 per GB - data downloaded / month over 50 TB

Data transferred between Amazon S3 and Amazon EC2 is free of charge

Requests
$0.01 per 1,000 PUT or LIST requests
$0.01 per 10,000 GET and all other requests*
* No charge for delete requests

Storage and bandwidth size includes all file overhead

I looked around the web for similar services (hard to find someone that posts prices), and for 180 gig’s of reduntantly stored data, it was in the $200/month price range.

The same 180Gb on S3 would be

$27 to store for 30 days

$18 to xmit (the entire 180Gb) to S3

In addition to price, with S3, you are in full control of how and when you put and/or get your data.

Only pay for what you actually use

One of the niceties about S3 (and the other Amz web services) is that you pay just for what you use.

Didn’t use that service last month: Pay $0.

This allows you to ‘tinker’ all you want for mere pennies

Amazon S3 - Objects

What is an Object?

Object is the term we use in S3 for the ‘thing’ (file/data) you want to store.
Once an object is stored in S3, it contains the original data (contents of the file), plus a given amount of meta-data (name/value pairs).

You can add your own metadata but some of the standards are ‘Last-Modified‘ and ‘Content-Type

A given Object can be from 1byte to 5GBs

Amazon S3 - Buckets

Why Buckets?

Buckets provides a unique namespace for management of objects contained in the bucket

Bucket namespaces are Global across all of S3 (all users of S3. Similar concept as ‘domain names‘)

An S3 account is allowed 100 buckets

Amazon S3 - Keys

Key

A key is the unique identifier for an object within a bucket

Locating an object

Any Object can be located by its [bucket + key] using a RESTful formatted URL

   http://s3.amazonaws.com/foo-products/2006/may/1845.prd


foo-products is the bucket & 2006/may/1845.prd is the Key

S3 - Authentication

Most requests to S3 require authentication, this ensures that you don’t get charged for operations you didn’t authorize, and that nobody else sees your private data.

You can grant various access models (acl) for an Object or an entire Bucket

  • private
  • public-read
  • public-read-write
  • authenticated-read

To set the ACL, when you PUT the Object to S3, you set a x-amz-acl header. For example…

x-amz-acl: public-read

ACl defaults to private if not set on the PUT

S3: Putting it all Together

How do you speak to S3?

At this point, all interaction is done with the HTTP protocol (the current exception is that you can retrieve objects using http or BitTorrent).

So, creating a program to interact with S3 is just a matter of creating HTTP requests and reading HTTP responses. Something PHP is quite capable of (especially with a little help from PEAR HTTP_Request and Crypt_HMAC)

First get an account and your Keys

  • Access Key ID: You add this to any requests to S3. Essentially it is your unique identifier that tells S3 a given request is targeted for your account.
  • Secret Access Key: For requests to S3 for Objects with acl’s that require authentication (i.e. private, authenticated-read), you ’sign’ your request with this secret key.

The following code examples are based on what is in the Amazon S3 developer docs

Create a Bucket

# create bucket request

PUT /[bucket-name] HTTP/1.0
Date: Wed, 08 May 2007 08:45:09 GMT
Authorization: AWS [aws-access-key-id]:[header-signature]
Host: s3.amazonaws.com

# create bucket response

HTTP/1.1 200 OK
x-amz-id-2: VjzdTviQorQtSjcgLshzCZSzN+7CnewvHA+6sNxR3VRcUPyO5fmSmo8bWnIS52qa
x-amz-request-id: 91A8CC60F9FC49E7
Date: Wed, 08 Mar 2006 04:06:15 GMT
Location: /[bucket-name]
Content-Length: 0
Connection: keep-alive
Server: AmazonS3

Put Objects in your Bucket

# put object request

PUT /[bucket-name]/[key-name] HTTP/1.0
Date: Wed, 08 Mar 2006 04:06:16 GMT
Authorization: AWS [aws-access-key-id]:[header-signature]
Host: s3.amazonaws.com
Content-Length: 14
x-amz-meta-title: my title
Content-Type: text/plain

this is a test

# put object response

HTTP/1.1 200 OK
x-amz-id-2: wc15E1LUrjDZhNtT4QZtsbtadnOMKGjw5QTxkRDVO1owwbA6YoiqJJEuKShopufw
x-amz-request-id: 7487CD42C5CA7524
Date: Wed, 08 Mar 2006 04:06:16 GMT
ETag: "54b0c58c7ce9f2a8b551351102ee0938"
Content-Length: 0
Connection: keep-aliveServer: AmazonS3

Retrieve Objects from your bucket

# get object request
GET /[bucket-name]/[key-name] HTTP/1.0
Date: Wed, 08 Mar 2006 04:06:18 GMT
Authorization: AWS [aws-access-key-id]:[header-signature]
Host: s3.amazonaws.com
# get object response
HTTP/1.1 200 OK
x-amz-id-2: FbGpiykb9oJEdJd0bcfwkL6S3lc06X0y7XSeA/GWyRdvlNEZ0irthljxKoeGFfB6
x-amz-request-id: 9298531013923634
Date: Wed, 08 Mar 2006 04:06:18 GMT
Last-Modified: Wed, 08 Mar 2006 04:06:16 GMT
ETag: "54b0c58c7ce9f2a8b551351102ee0938"
x-amz-meta-title: my title
Content-Type: text/plain
Content-Length: 14
Connection: keep-alive
Server: AmazonS3
this is a test

S3 - The PHP way

Implementing an API to S3 with PHP

Prerequisites

You’ll need the PEAR libraries Crypt_HMAC & HTTP_Request (at least things are much easier if you have these)

# sudo pear install Crypt_HMAC
pear.php.net" to update
downloading Crypt_HMAC-1.0.1.tgz ...
Starting to download Crypt_HMAC-1.0.1.tgz (2,149 bytes)
....done: 2,149 bytes
install ok: channel://pear.php.net/Crypt_HMAC-1.0.1
sam$ sudo pear install HTTP_Request
pear.php.net" to update
downloading HTTP_Request-1.4.0.tgz ...
Starting to download HTTP_Request-1.4.0.tgz (15,262 bytes)
.....done: 15,262 bytes
downloading Net_URL-1.0.14.tgz ...
Starting to download Net_URL-1.0.14.tgz (5,173 bytes)
...done: 5,173 bytes
downloading Net_Socket-1.0.7.tgz ...
Starting to download Net_Socket-1.0.7.tgz (5,419 bytes)
...done: 5,419 bytes
install ok: channel://pear.php.net/Net_URL-1.0.14
install ok: channel://pear.php.net/Net_Socket-1.0.7
install ok: channel://pear.php.net/HTTP_Request-1.4.0

Creating the API

At this point all you really need to do is create a function for each needed interaction with S3 (or better yet, a PHP Object with a method for each). So something like…

createBucket()
putObject()
getObject()
getBucketListing()
   ...

These functions are going to be creating http requests and reading http responses. Sometimes this can be a bit tricky (one missing ‘\n’ and you’re screwed), so leverage what what other have done befor you. The Amazon web services site has some good examples but in particular, I would recommend you look at ‘Test Utility for Amazon S3 in PHP‘ which does a good job of demo’ing most of the S3 functionality using PHP.

I used this code as a starting point to develop a very simple ‘Rsync’ type application for Amazon S3.

Resources

0 comments: