-
@Dramatic_Hammer and @bothwell
If a URL is:
http://uk.weather.com/weather/today-London-UKXX0085
And we parsed the URL, and kept only the last bit and the domain (in your example):
uk.weather.com/.../today-London-UKXX0085
That works for that URL.
But if the URL is this:
https://www.google.co.uk/search?client=ubuntu&channel=fs&q=london+weather&ie=utf-8&oe=utf-8&gl=uk&gws_rd=cr&ei=8o8HVNWtLIav7AbbsIEI#channel=fs&gl=uk&q=london+weather&spell=1
Then should the answer be:
http://www.google.co.uk/search
Which is meaningless.
And as the page is identified, and the real meaning of the page is actually all within the query string... should we keep that and lose the path?
http://www.google.co.uk/...?client=ubuntu&channel=fs&q=london+weather&ie=utf-8&oe=utf-8&gl=uk&gws_rd=cr&ei=8o8HVNWtLIav7AbbsIEI#channel=fs&gl=uk&q=london+weather&spell=1
Or lose the last 20% of this?
http://www.google.co.uk/search?client=ubuntu&channel=fs&q=london+weather&ie=utf-8&oe=utf-8&gl=uk&gws_rd=cr&ei=8o8HVNWtLIav7AbbsIEI#channel...
URLs are determined not solely by the path. Whilst a lot of URLs can be identified by the domain plus path, a not insignificant number can only be identified by the query string... and the important parts of that could be anywhere within the querystring.
In the example above, the most important part is the centre of the string (that is the URL).
Most of the specs on this, and most of the way in which web sites are constructed, mean that the URLs follow a hierarchy of resources... that the left-most part of a path is a higher level section of a site hierarchy.
The leftmost part does not strongly identify a single specific resource, but it does strongly identify the area to which the resource belongs. This is an important distinction as a basis for trust is why we're showing anything at all.
An example could be reddit... if we removed the middle part of a URL, an untrusted section
/b/
might fool you into visiting a URL because you trust the last part of the path (it meets your expectations). The middle part, which you wish to remove, is a stronger candidate for trust than the last part.There is no flawless way to determine, or guess with any accuracy, which part of a URL is the best bit for trust. But the way in which the web has evolved means that you trust the domain first, the left most part of the path next... and you move rightwards.
This is why we're going to keep on using the most significant part. There will always be exceptions, but without understanding the URL structure for every destination a good default general rule is to keep as much of the leftmost portion of a URL.
Assume - occasionally knocking out the useful part has got to be better than always removing some or all of the most important part, shirley?