Maybe I'm late to the party but I just discovered that you can set the header Cache-Control in Amazon S3.
What does setting the cache control header mean?
Straight from the rfc:
The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response. These directives typically override the default caching algorithms. Cache directives are unidirectional in that the presence of a directive in a request does not imply that the same directive is to be given in the response.
Explained in simple words this means that every cache between you and the requesting browser will keep the file for the timeframe specified in the header. Even the browser itself will cache the file and if a user encounters for example an previously in s3 uploaded image, it will not request that image again.
Everybody wins. The page loads faster for the enduser and you have to pay less for traffic.
If you use ruby and the rightscale gem, here is a simple request setting the header right for caching it. Start irb:
require 'rubygems'
require 'right_aws'
s3 = RightAws::S3Interface.new(ACCESS_KEY,SECRET_ACCESS,{:multi_thread => true})
s3.create_bucket('test.bigcurl.de')
s3.put "test.bigcurl.de", "untitled.txt",'Cache me if you can!',{'Content-Type'=>'text/plain','Cache-Control' => 'public,max-age=31536000'}
the result from the web looks like this:
curl -I s3.amazonaws.com/test.bigcurl.de/untitled.txt
HTTP/1.1 200 OK
x-amz-id-2: QWpqMS6h32b+
x-amz-request-id: ECC1EF0ABCAA0AD6
Date: Wed, 19 Nov 2008 00:27:19 GMT
Cache-Control: public, max-age=31536000
Last-Modified: Wed, 19 Nov 2008 00:23:35 GMT
ETag: "ce114e4501d2f4e2dcea3e17b546f339"
Content-Type: text/plain
Content-Length: 14
Server: AmazonS3
Pretty cool and since the header is still intact after Amazon Cloudfront, you'll benefit there as well.
Keep in mind that some files are more suitable than others for caching. Every static content like pictures, css files, javascript files are good candidates for caching. Dynamically generate data in which the content might change over time are no good candidates.
So be careful on which filetypes you increase the cache control time. You can not expire the file later via the server.
To avoid at least a few hickups implement a versioning mechanism like this: flower.jpg becomes flower.1.jpg. If you want to upload a newer version of the flowers pic you simply increase the number like this flower.2.jpg and it will be instantaneously available in the cache and as you generate new links with the new filename in your app it will not serve the old file anymore.