S3OSCache
Description
S3OSCache is an OSCache store
implementation that uses Amazon's Web Service S3 service to
store the cached data. What this means is that if you are
currently using oscache and storing cached objects on disk or in
memory, you could easily configure this add-on to oscache and start
storing your cached objects on S3.
Motivation
You're probably asking why would you want to do this. Here
is my situation... I have a website publishing system that
dynamically resizes images for the user on demand. For
example, it might create thumbnails or simply shrink a 2 megapixel
photo to efficiently display that as a 400x300 pixel image.
Image resizing is a CPU intensive activity, especially with the
Java Imaging API. From the start, I used OSCache to cache
these smaller versions of the images on the app server's file
system. The problem is that I can have anywhere between 2 and
7 app servers running at a time. This means that in the worst
(and common) case, that image gets resized 7 times across my
cluster.
I saw what could be done with OSCache storage API, so I took the
Amazon S3 REST API and refactored it, since I needed to add HEAD
method support and didn't need a lot else. I repackaged the
REST API code so that it won't collide with the main Amazon S3 REST
API jar if you're using that. The result is s3oscache
Requirements
Here is what you need to get going:
- Java 1.5 or greater (?
- I'm guessing it will work in 1.4, but haven't tried)
- OSCache 2.3 or later
- commons
logging 1.0.4 (included in oscache)
- An account with S3
- A bucket and your AWS access information
- S3OSCache library
Download
You can download s3oscache-1.0.zip or s3oscache-1.0.tgz.
This contains some sample code, the source code, a build script,
and the jar you need.
Getting Started
There is a sample program in the download. You use it by
running:
java -jar s3oscache-sample-1.0.jar <bucket>
<aws_access_key> <aws_secret_key>
This manually creates the oscache manager with the appropriate
settings and caches a string for up to a minute. You can run
it over and over and see that it is persisting this across
runs. You can also use S3Fox, a
firefox browser extension, to view that content is getting stored.
If you're familiar with OSCache configuration, the setup is
pretty easy. Here are the properties specific to S3OSCache's
store:
Property name
| Value
| Description
|
| cache.persistence.class
| com.amazon.s3oscache.S3OSCachePersistanceListener
| Sets OSCache to use this library for persistence.
|
bucket
| Your bucket name
| Where to store the data.
|
aws_access_key_id
| Your access key
| Login credentials
|
aws_secret_access_key
| Your secret key
| Login credentials
|
serialization.safe
| true or false (Default false)
| If true, then s3oscache will serialize objects to a byte array
before contacting s3. If false, then s3oscache will directly
stream serialized data to s3.
|
Key naming
Since OSCache expects a unique key and S3 can take just about
any key, I largely just use the key provided to store the data at
S3. However, I do apply a transformation:
- Prefix key with "data/"
- Split apart first 2 key values with /
For example, an OSCache key of "myString" would become an S3 key
of "data/m/y/String". While S3 does not impose directories,
this is something that S3Fox handles nicely and made browsing the
content very easy for my needs. This should really be
configurable as people may not want the prefix or path splitting,
or conversely might want automatic key hashing.
Performance
There are several things to consider when using S3OSCache.
Certainly this is not appropriate for every caching situation, just
like memory or disk is not appropriate for every cache situation.
- Each cache retrieval is an HTTP roundtrip, over the Internet,
so the performance is going to depend on where you host your
code. Here are some S3 HTTP performance numbers from a research
project. I'm seeing acceptable performance at a major
hosting company and plan to migrate to EC2 down the road, which should
give improved networking performance.
- The connection setup. The first call to S3 requires you
to do DNS lookup and establish a TCP connection before you can
initiate the HTTP method. After that first request thanks to
HTTP Keep-Alives, you can hold open that TCP connection for a while
and reuse it for additional connections. I've found this
connection setup to take vastly more time than the actual commands.
From my personal tests, I see 400ms response time starting
from scratch and 30ms response times when reusing an existing
connection. Certainly connection keepalives can have a dramatic
performance impact.
Fortunately, Java's java.util.URL by default does HTTP
connection pooling with keepalives. You can read more about
how it works and how to configure Java's
HTTP Keep-Alive support.
Feedback
Feel free to drop me an email at sergek@lokitech.com.