Why RFC 3986 Matters
RFC 3986 defines the generic syntax for Uniform Resource Identifiers. In practical web development, it explains which characters can appear in URLs as-is and which characters need percent-encoding.
Percent-encoding replaces a byte with % followed by two hexadecimal digits. A space becomes %20, a slash used as data becomes %2F, and the percent sign itself becomes %25.
Unreserved Characters
Unreserved characters can appear in a URL without encoding:
A-Z a-z 0-9 - . _ ~
These characters have no special delimiter role in the generic URL syntax. Encoding them is usually unnecessary. For example, hello-world_2026 is already safe.
Reserved Characters
Reserved characters can have special meaning:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
They are not always wrong in a URL. They are wrong only when used as data in a place where they could be interpreted as syntax. A slash between path segments should remain /; a slash inside a product ID should become %2F.
Component Matters More Than Character
Encoding rules depend on where the value will be placed:
| Component | Example | Encoding rule |
|---|---|---|
| Scheme | https |
Do not encode |
| Host | example.com |
Use IDNA/punycode for international domains |
| Path segment | /docs/my file |
Encode spaces and data slashes |
| Query value | q=a&b |
Encode & as %26 |
| Fragment | #section 1 |
Encode spaces if generated programmatically |
The same character can be valid syntax in one component and unsafe data in another.
UTF-8 Comes First
Percent-encoding operates on bytes, not abstract characters. Modern URLs use UTF-8 bytes before percent-encoding. The character é becomes two bytes, C3 A9, and then %C3%A9.
Emoji work the same way. The character is encoded to its UTF-8 byte sequence, then each byte becomes a percent triplet.
Uppercase vs Lowercase Hex
Percent triplets are case-insensitive, so %2f and %2F represent the same byte. Uppercase hex is easier to read and is commonly preferred in generated URLs.
Normalization and Canonical URLs
Consistent encoding helps canonicalization. Search engines and caches may see differently encoded URLs as separate resources if the server treats them differently. For public links, choose one canonical form and redirect or generate links consistently.
A good rule is to keep unreserved characters unencoded, encode data in each component, and avoid double encoding.
Common RFC 3986 Pitfalls
% must be encoded as %25 when it is a literal percent sign. If you output 100% organic in a URL without encoding the percent sign, the next two characters may be interpreted as an invalid escape sequence.
+ is not a general replacement for spaces in every URL component. It is a form-encoding convention for query strings and request bodies using application/x-www-form-urlencoded.
Frequently Asked Questions
Are URLs ASCII only?
The transmitted URL form is effectively ASCII-safe because non-ASCII characters are encoded. Browsers may display readable Unicode, but they serialize it to a safe encoded form for requests.
Should I encode tilde?
No. ~ is an unreserved character in RFC 3986 and does not need encoding.
Is percent-encoding encryption?
No. Percent-encoding is reversible formatting for transport. It does not hide or protect data.