S3OSCache is an OSCache store implementation that uses Amazon's Web Service S3 service to store the cached data. What this means is that if you are currently using oscache and storing cached objects on disk or in memory, you could easily configure this add-on to oscache and start storing your cached objects on S3.
You're probably asking why would you want to do this. Here is my situation... I have a website publishing system that dynamically resizes images for the user on demand. For example, it might create thumbnails or simply shrink a 2 megapixel photo to efficiently display that as a 400x300 pixel image.
Image resizing is a CPU intensive activity, especially with the Java Imaging API. From the start, I used OSCache to cache these smaller versions of the images on the app server's file system. The problem is that I can have anywhere between 2 and 7 app servers running at a time. This means that in the worst (and common) case, that image gets resized 7 times across my cluster.
I saw what could be done with OSCache storage API, so I took the Amazon S3 REST API and refactored it, since I needed to add HEAD method support and didn't need a lot else. I repackaged the REST API code so that it won't collide with the main Amazon S3 REST API jar if you're using that. The result is s3oscache
Here is what you need to get going:
- Java 1.5 or greater (? - I'm guessing it will work in 1.4, but haven't tried)
- OSCache 2.3 or later
logging 1.0.4 (included in oscache)
- An account with S3
- A bucket and your AWS access information
- S3OSCache library
There is a sample program in the download. You use it by running:
java -jar s3oscache-sample-1.0.jar <bucket> <aws_access_key> <aws_secret_key>
This manually creates the oscache manager with the appropriate settings and caches a string for up to a minute. You can run it over and over and see that it is persisting this across runs. You can also use S3Fox, a firefox browser extension, to view that content is getting stored.
If you're familiar with OSCache configuration, the setup is pretty easy. Here are the properties specific to S3OSCache's store:
|cache.persistence.class||com.amazon.s3oscache.S3OSCachePersistanceListener||Sets OSCache to use this library for persistence.|
|bucket||Your bucket name||Where to store the data.|
|aws_access_key_id||Your access key||Login credentials|
|aws_secret_access_key||Your secret key||Login credentials|
|serialization.safe||true or false (Default false)||If true, then s3oscache will serialize objects to a byte array
before contacting s3. If false, then s3oscache will directly
stream serialized data to s3.|
Since OSCache expects a unique key and S3 can take just about any key, I largely just use the key provided to store the data at S3. However, I do apply a transformation:
- Prefix key with "data/"
- Split apart first 2 key values with /
For example, an OSCache key of "myString" would become an S3 key of "data/m/y/String". While S3 does not impose directories, this is something that S3Fox handles nicely and made browsing the content very easy for my needs. This should really be configurable as people may not want the prefix or path splitting, or conversely might want automatic key hashing.
There are several things to consider when using S3OSCache. Certainly this is not appropriate for every caching situation, just like memory or disk is not appropriate for every cache situation.
- Each cache retrieval is an HTTP roundtrip, over the Internet, so the performance is going to depend on where you host your code. Here are some S3 HTTP performance numbers from a research project. I'm seeing acceptable performance at a major hosting company and plan to migrate to EC2 down the road, which should give improved networking performance.
- The connection setup. The first call to S3 requires you to do DNS lookup and establish a TCP connection before you can initiate the HTTP method. After that first request thanks to HTTP Keep-Alives, you can hold open that TCP connection for a while and reuse it for additional connections. I've found this connection setup to take vastly more time than the actual commands. From my personal tests, I see 400ms response time starting from scratch and 30ms response times when reusing an existing connection. Certainly connection keepalives can have a dramatic performance impact.
Fortunately, Java's java.util.URL by default does HTTP connection pooling with keepalives. You can read more about how it works and how to configure Java's HTTP Keep-Alive support.
Feel free to drop me an email at firstname.lastname@example.org.