Код: Выделить всё
String[] testUrls = {
// Plain, unencoded
"https://exämple.com/földer/file 1.txt",
// Properly encoded
"https://example.com/f%C3%B6lder/file%201.txt",
// Partially encoded
"https://example.com/f%C3%B6lder/file 1.txt",
// Invalid percent sequence
"https://example.com/100%done.txt",
// Percent at end
"https://example.com/file%",
// Space and umlaut unencoded
"https://example.com/ä ö ü.txt",
// Nested folders with mixed encoding
"https://example.com/a%20b/c d/e%20f.txt"
};
Код: Выделить всё
public static String normalizeUrl(String url) {
if (url == null) return null;
try {
// Step 1: sanitize invalid % sequences
String safe = url.replaceAll("%(?![0-9A-Fa-f]{2})", "%25");
// Step 2: decode and re-encode safely (UTF-8)
String decoded = URLDecoder.decode(safe, "UTF-8");
String encoded = URLEncoder.encode(decoded, "UTF-8");
// Step 3: URLEncoder encodes spaces as '+', but URLs use '%20'
encoded = encoded.replace("+", "%20");
// Step 4: URLEncoder encodes '/' and : — undo that
encoded = encoded.replace("%2F", "/");
encoded = encoded.replace("%3A", ":");
return encoded;
} catch (IllegalArgumentException | UnsupportedEncodingException e) {
// fallback: return original URL unchanged
return url;
}
Код: Выделить всё
Original: https://exämple.com/földer/file 1.txt
Normalized: https://ex%C3%A4mple.com/f%C3%B6lder/file%201.txt
-------------------------------
Original: https://example.com/f%C3%B6lder/file%201.txt
Normalized: https://example.com/f%C3%B6lder/file%201.txt
-------------------------------
Original: https://example.com/f%C3%B6lder/file 1.txt
Normalized: https://example.com/f%C3%B6lder/file%201.txt
-------------------------------
Original: https://example.com/100%done.txt
Normalized: https://example.com/100%25done.txt
-------------------------------
Original: https://example.com/file%
Normalized: https://example.com/file%25
-------------------------------
Original: https://example.com/ä ö ü.txt
Normalized: https://example.com/%C3%A4%20%C3%B6%20%C3%BC.txt
-------------------------------
Original: https://example.com/a%20b/c d/e%20f.txt
Normalized: https://example.com/a%20b/c%20d/e%20f.txt
Мне нравится общая идея сначала декодирования, когда я «нормализую до некодированного», а затем снова кодирую, но кажется удивительно трудным придумать надежную реализацию. Есть ли стандартное решение этой проблемы?
Подробнее здесь: https://stackoverflow.com/questions/797 ... ded-or-not
Мобильная версия