Subtle changes, bugs and feedback

You are reading a single comment by @Dramatic_Hammer and its replies. Click here to read the full conversation.

•

Dramatic_Hammer

Assume - occasionally knocking out the useful part has got to be better than always removing some or all of the most important part, shirley?

•

Velocio in reply to @Dramatic_Hammer

@Dramatic_Hammer and @bothwell

If a URL is:

http://uk.weather.com/weather/today-London-UKXX0085

And we parsed the URL, and kept only the last bit and the domain (in your example):

uk.weather.com/.../today-London-UKXX0085

That works for that URL.

But if the URL is this:

https://www.google.co.uk/search?client=ubuntu&channel=fs&q=london+weather&ie=utf-8&oe=utf-8&gl=uk&gws_rd=cr&ei=8o8HVNWtLIav7AbbsIEI#channel=fs&gl=uk&q=london+weather&spell=1

Then should the answer be:

http://www.google.co.uk/search

Which is meaningless.

And as the page is identified, and the real meaning of the page is actually all within the query string... should we keep that and lose the path?

http://www.google.co.uk/...?client=ubuntu&channel=fs&q=london+weather&ie=utf-8&oe=utf-8&gl=uk&gws_rd=cr&ei=8o8HVNWtLIav7AbbsIEI#channel=fs&gl=uk&q=london+weather&spell=1

Or lose the last 20% of this?

http://www.google.co.uk/search?client=ubuntu&channel=fs&q=london+weather&ie=utf-8&oe=utf-8&gl=uk&gws_rd=cr&ei=8o8HVNWtLIav7AbbsIEI#channel...

URLs are determined not solely by the path. Whilst a lot of URLs can be identified by the domain plus path, a not insignificant number can only be identified by the query string... and the important parts of that could be anywhere within the querystring.

In the example above, the most important part is the centre of the string (that is the URL).

Most of the specs on this, and most of the way in which web sites are constructed, mean that the URLs follow a hierarchy of resources... that the left-most part of a path is a higher level section of a site hierarchy.

The leftmost part does not strongly identify a single specific resource, but it does strongly identify the area to which the resource belongs. This is an important distinction as a basis for trust is why we're showing anything at all.

An example could be reddit... if we removed the middle part of a URL, an untrusted section /b/ might fool you into visiting a URL because you trust the last part of the path (it meets your expectations). The middle part, which you wish to remove, is a stronger candidate for trust than the last part.

There is no flawless way to determine, or guess with any accuracy, which part of a URL is the best bit for trust. But the way in which the web has evolved means that you trust the domain first, the left most part of the path next... and you move rightwards.

This is why we're going to keep on using the most significant part. There will always be exceptions, but without understanding the URL structure for every destination a good default general rule is to keep as much of the leftmost portion of a URL.

Subtle changes, bugs and feedback

About

LFGSS