Caching Tutorial for Web Authors and Webmasters
Caching Tutorialfor Web Authors and Webmasters
- What’s a Web Cache? Why do people use
them?- Kinds of Web Caches
- Browser Caches
- Proxy Caches
- Aren’t Web Caches bad for me? Why should I
help them?- How Web Caches Work
- How (and how not) to Control Caches
- HTML Meta Tags vs. HTTP
Headers- Pragma HTTP Headers (and why they
don’t work)- Controlling Freshness with the
Expires HTTP Header- Cache-Control HTTP
Headers- Validators and
Validation- Tips for Building a Cache-Aware
Site- Writing Cache-Aware Scripts
- Frequently Asked Questions
- Implementation Notes — Web
Servers- Implementation Notes — Server-Side
Scripting- References and Further Information
- About This Document
What’s a Web Cache? Why do people
use them?A Web cache sits between one or more Web servers (also known as
origin servers) and a client or many clients, and watches requests
come by, saving copies of the responses — like HTML pages, images and files
(collectively known as representations) — for itself. Then, if there
is another request for the same URL, it can use the response that it has,
instead of asking the origin server for it again.There are two main reasons that Web caches are used:
Kinds of Web CachesBrowser Caches
- To reduce latency — Because the request is satisfied
from the cache (which is closer to the client) instead of the origin server,
it takes less time for it to get the representation and display it. This
makes the Web seem more responsive.- To reduce network traffic — Because representations are
reused, it reduces the amount of bandwidth used by a client. This saves
money if the client is paying for traffic, and keeps their bandwidth
requirements lower and more manageable.If you examine the preferences dialog of any modern Web browser (like
Internet Explorer, Safari or Mozilla), you’ll probably notice a “cache”
setting. This lets you set aside a section of your computer’s hard disk to
store representations that you’ve seen, just for you. The browser cache works
according to fairly simple rules. It will check to make sure that the
representations are fresh, usually once a session (that is, the once in the
current invocation of the browser).This cache is especially useful when users hit the “back” button or click a
Proxy Caches
link to see a page they’ve just looked at. Also, if you use the same
navigation images throughout your site, they’ll be served from browsers’
caches almost instantaneously.Web proxy caches work on the same principle, but a much larger scale.
Proxies serve hundreds or thousands of users in the same way; large
corporations and ISPs often set them up on their firewalls, or as standalone
devices (also known as intermediaries).Because proxy caches aren’t part of the client or the origin server, but
instead are out on the network, requests have to be routed to them somehow.
One way to do this is to use your browser’s proxy setting to manually tell it
what proxy to use; another is using interception. Interception
proxies have Web requests redirected to them by the underlying
network itself, so that clients don’t need to be configured for them, or even
know about them.Proxy caches are a type of shared cache; rather than just having
Gateway Caches
one person using them, they usually have a large number of users, and because
of this they are very good at reducing latency and network traffic. That’s
because popular representations are reused a number of times.Also known as “reverse proxy caches” or “surrogate caches,” gateway caches
are also intermediaries, but instead of being deployed by network
administrators to save bandwidth, they’re typically deployed by Webmasters
themselves, to make their sites more scalable, reliable and better
performing.Requests can be routed to gateway caches by a number of methods, but
typically some form of load balancer is used to make one or more of them look
like the origin server to clients.Content delivery networks (CDNs) distribute gateway caches
throughout the Internet (or a part of it) and sell caching to interested Web
sites. Speedera and Akamai are examples of
CDNs.This tutorial focuses mostly on browser and proxy caches, although some of
Aren’t Web Caches bad for me? Why should I help
the information is suitable for those interested in gateway caches as
well.
them?Web caching is one of the most misunderstood technologies on the Internet.
Webmasters in particular fear losing control of their site, because a proxy
cache can “hide” their users from them, making it difficult to see who’s using
the site.Unfortunately for them, even if Web caches didn’t exist, there are too many
variables on the Internet to assure that they’ll be able to get an accurate
picture of how users see their site. If this is a big concern for you, this
tutorial will teach you how to get the statistics you need without making your
site cache-unfriendly.Another concern is that caches can serve content that is out of date, or
stale. However, this tutorial can show you how to configure your
server to control how your content is cached.CDNs
are an interesting development, because unlike many
proxy caches, their gateway caches are aligned with the interests of the
Web site being cached, so that these problems aren’t seen. However, even
when you use a CDN, you still have to consider that there will be proxy
and browser caches downstream.On the other hand, if you plan your site well, caches can help your Web
site load faster, and save load on your server and Internet link. The
difference can be dramatic; a site that is difficult to cache may take
several seconds to load, while one that takes advantage of caching can seem
instantaneous in comparison. Users will appreciate a fast-loading site, and
will visit more often.Think of it this way; many large Internet companies are spending millions
of dollars setting up farms of servers around the world to replicate their
content, in order to make it as fast to access as possible for their users.
Caches do the same for you, and they’re even closer to the end user. Best of
all, you don’t have to pay for them.The fact is that proxy and browser caches will be used whether you like it
How Web Caches Work
or not. If you don’t configure your site to be cached correctly, it will be
cached using whatever defaults the cache’s administrator decides upon.All caches have a set of rules that they use to determine when to serve a
representation from the cache, if it’s available. Some of these rules are set
in the protocols (HTTP 1.0 and 1.1), and some are set by the administrator of
the cache (either the user of the browser cache, or the proxy
administrator).Generally speaking, these are the most common rules that are followed
(don’t worry if you don’t understand the details, it will be explained
below):
- If the response’s headers tell the cache not to keep it,
it won’t.- If the request is authenticated or secure (i.e., HTTPS), it won’t be
cached.- A cached representation is considered fresh (that is, able to
be sent to a client without checking with the origin server) if:
- It has an expiry time or other age-controlling header set, and is
still within the fresh period, or- If the cache has seen the representation recently, and it was
modified relatively long ago.Fresh representations are served directly from the cache, without checking
with the origin server.- If an representation is stale, the origin server will be asked to
validate it, or tell the cache whether the copy that it has is
still good.- Under certain circumstances — for example, when it’s disconnected from a network —
a cache can serve stale responses without checking with the origin server.If no validator (an
ETag
orLast-Modified
header) is
present on a response, and it doesn't have any explicit freshness information,
it will usually — but not always — be considered uncacheable.欢迎分享,转载请注明来源:内存溢出
评论列表(0条)