RFC 3986 Percent-Encoding: Reserved and Unreserved Characters

Try the URL Encoder
RFC 3986 Percent-Encoding: Reserved and Unreserved Characters

Why RFC 3986 Matters

RFC 3986 defines the generic syntax for Uniform Resource Identifiers. In practical web development, it explains which characters can appear in URLs as-is and which characters need percent-encoding.

Percent-encoding replaces a byte with % followed by two hexadecimal digits. A space becomes %20, a slash used as data becomes %2F, and the percent sign itself becomes %25.

Unreserved Characters

Unreserved characters can appear in a URL without encoding:

A-Z a-z 0-9 - . _ ~

These characters have no special delimiter role in the generic URL syntax. Encoding them is usually unnecessary. For example, hello-world_2026 is already safe.

Reserved Characters

Reserved characters can have special meaning:

: / ? # [ ] @ ! $ & ' ( ) * + , ; =

They are not always wrong in a URL. They are wrong only when used as data in a place where they could be interpreted as syntax. A slash between path segments should remain /; a slash inside a product ID should become %2F.

Component Matters More Than Character

Encoding rules depend on where the value will be placed:

Component Example Encoding rule
Scheme https Do not encode
Host example.com Use IDNA/punycode for international domains
Path segment /docs/my file Encode spaces and data slashes
Query value q=a&b Encode & as %26
Fragment #section 1 Encode spaces if generated programmatically

The same character can be valid syntax in one component and unsafe data in another.

UTF-8 Comes First

Percent-encoding operates on bytes, not abstract characters. Modern URLs use UTF-8 bytes before percent-encoding. The character é becomes two bytes, C3 A9, and then %C3%A9.

Emoji work the same way. The character is encoded to its UTF-8 byte sequence, then each byte becomes a percent triplet.

Uppercase vs Lowercase Hex

Percent triplets are case-insensitive, so %2f and %2F represent the same byte. Uppercase hex is easier to read and is commonly preferred in generated URLs.

Normalization and Canonical URLs

Consistent encoding helps canonicalization. Search engines and caches may see differently encoded URLs as separate resources if the server treats them differently. For public links, choose one canonical form and redirect or generate links consistently.

A good rule is to keep unreserved characters unencoded, encode data in each component, and avoid double encoding.

Common RFC 3986 Pitfalls

% must be encoded as %25 when it is a literal percent sign. If you output 100% organic in a URL without encoding the percent sign, the next two characters may be interpreted as an invalid escape sequence.

+ is not a general replacement for spaces in every URL component. It is a form-encoding convention for query strings and request bodies using application/x-www-form-urlencoded.

Frequently Asked Questions

Are URLs ASCII only?

The transmitted URL form is effectively ASCII-safe because non-ASCII characters are encoded. Browsers may display readable Unicode, but they serialize it to a safe encoded form for requests.

Should I encode tilde?

No. ~ is an unreserved character in RFC 3986 and does not need encoding.

Is percent-encoding encryption?

No. Percent-encoding is reversible formatting for transport. It does not hide or protect data.

Encode URLs Instantly

Encode and decode URLs with full Unicode support, multiple encoding modes, and batch processing.

Open URL Encoder