Re: QP (var Re: gcal 2.40)

From: Tomas Gradin (tg_at_bosun.bm.lu.se)
Date: 1997-05-26 18:01:41

Nästa brev: Jan D.: "Re: QP (var Re: gcal 2.40)"
Tidigare brev: Tomas Gradin: "Re: QP (var Re: gcal 2.40)"
Kanske ett svar till: David: "QP (var Re: gcal 2.40)"
Nästa i tråden: Jan D.: "Re: QP (var Re: gcal 2.40)"
Svar: Jan D.: "Re: QP (var Re: gcal 2.40)"
Brev sortede efter: [ datum ] [ tråd ] [ ämne ] [ författare ] [ bilaga ]

>Ok, standarden säger:
>   Octets must be encoded if they have no corresponding graphic
>   character within the US-ASCII coded character set, if the use of the
>   corresponding character is unsafe, or if the corresponding character
>   is reserved for some other interpretation within the particular URL
>   scheme.
>
>   No corresponding graphic US-ASCII:
>
>   URLs are written only with the graphic printable characters of the
>   US-ASCII coded character set. The octets 80-FF hexadecimal are not
>   used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
>   control characters; these must be encoded.
>
>Jag kan inte få detta till att "å" (som inte finns i US-ASCII) är OK i
>en URL.

Läs då detta, som är ur *standarderna* HTML/3.0 och HTTP/1.0:

HTML/3.0:

   Character sets
          The charset parameter (as defined in section 7.1.1 of RFC 1521)
          may be used with the text/html content type to specify the
          encoding used to represent the HTML document as a sequence of
          bytes. Normally, text/* media types specify a default of
          US-ASCII for the charset parameter. However, for text/html, if
*         the byte stream contains data that is not in the 7-bit US-ASCII
*         set, the HTML interpreting agent should assume a default
*         charset of ISO-8859-1.

HTTP/1.0 (<http://www.w3.org/pub/WWW/Protocols/rfc1945/rfc1945>):

    3.2.1 General Syntax

   URIs in HTTP/1.0 can be represented in absolute form or relative to
   some known base URI [9], depending upon the context of their use. The
   two forms are differentiated by the fact that absolute URIs always
   begin with a scheme name followed by a colon.


       URI            = ( absoluteURI | relativeURI ) [ "#" fragment ]

       absoluteURI    = scheme ":" *( uchar | reserved )

       relativeURI    = net_path | abs_path | rel_path

       net_path       = "//" net_loc [ abs_path ]
       abs_path       = "/" rel_path
       rel_path       = [ path ] [ ";" params ] [ "?" query ]

       path           = fsegment *( "/" segment )
       fsegment       = 1*pchar
       segment        = *pchar

       params         = param *( ";" param )
       param          = *( pchar | "/" )

       scheme         = 1*( ALPHA | DIGIT | "+" | "-" | "." )
       net_loc        = *( pchar | ";" | "?" )
       query          = *( uchar | reserved )
       fragment       = *( uchar | reserved )

       pchar          = uchar | ":" | "@" | "&" | "="
       uchar          = unreserved | escape
       unreserved     = ALPHA | DIGIT | safe | extra | national

       escape         = "%" hex hex
       hex            = "A" | "B" | "C" | "D" | "E" | "F"
                      | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT

       reserved       = ";" | "/" | "?" | ":" | "@" | "&" | "="
       safe           = "$" | "-" | "_" | "." | "+"
       extra          = "!" | "*" | "'" | "(" | ")" | ","
*      national       = <any OCTET excluding CTLs, SP,
*                        ALPHA, DIGIT, reserved, safe, and extra>

*  For definitive information on URL syntax and semantics, see RFC 1738
*  [4] and RFC 1808 [9]. The BNF above includes national characters not
*  allowed in valid URLs as specified by RFC 1738, since HTTP servers are
*  not restricted in the set of unreserved characters allowed to
*  represent the rel_path part of addresses, and HTTP proxies may receive
*  requests for URIs not defined by RFC 1738.


Läs särskilt de markerade raderna. De visar solklart att ISO 8859-1-tecken som 
å, ä, ö, ~ etc. är tillåtna i URL:ar, eftersom de hör till gruppen "national" 
och inte till "reserved". Det står också uttryckligen att detta skiljer sig 
från hur RFC 1738 definierar URL:ar.

Vidare, i 1.2.1 i HTML-standarden (se ovan), sägs följande:

"/.../ For example, the value of the HREF attribute of the <A> element must
conform to the URI syntax." 

URI-standarden har inte heller något emot 8859-1.

Således, en URI av typen <http://www.lysator.liu.se/åttabitars/> är ingen URL 
enligt RFC 1738, men väl en acceptabel adress i en A-tag i ett HTML-dokument.

Att Netscape då inte klarar det är en klar bugg.

/tg

Nästa brev: Jan D.: "Re: QP (var Re: gcal 2.40)"
Tidigare brev: Tomas Gradin: "Re: QP (var Re: gcal 2.40)"
Kanske ett svar till: David: "QP (var Re: gcal 2.40)"
Nästa i tråden: Jan D.: "Re: QP (var Re: gcal 2.40)"
Svar: Jan D.: "Re: QP (var Re: gcal 2.40)"
Brev sortede efter: [ datum ] [ tråd ] [ ämne ] [ författare ] [ bilaga ]

Arkiv genererat av hypermail 2.1.1.