Understanding URL Percent-Encoding Standards
The Uniform Resource Locator, commonly known as a URL, is the global address system for web assets. According to standard Internet Engineering Task Force (IETF) RFC specifications, a URL must contain only a very narrow set of safe characters. Any character outside this safe boundary must be converted to a percent-encoded representation before being transmitted.
Reserved vs. Unreserved Characters
Characters in URLs are classified into two primary categories:
- Unreserved: Letters (
A-Z,a-z), digits (0-9), and hyphen (-), underscore (_), period (.), and tilde (~). These never require encoding. - Reserved: Characters that have special syntax meanings (e.g.
?for queries,&for parameters,/for paths). If they are used as actual data values rather than markers, they must be encoded.
How Percent Encoding Works
Percent-encoding converts non-ASCII or reserved characters into a sequence of bytes, with each byte represented by a percent sign (%) followed by its two-digit hexadecimal value. For example:
- A space character is encoded as %20.
- An ampersand (&) is encoded as %26.
- A slash (/) is encoded as %2F.