From Wikipedia, the free encyclopedia
Computing desk
< February 28 << Feb | March | Apr >> March 2 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


March 1 Information

Data issue

After I re-save in a .doc, .xls and so on, in a HDD, which version could be retrievable? 103.67.159.178 ( talk) 18:13, 1 March 2017 (UTC) reply

Normally when you resave a file in a different format, that is in addition to the old format. That is, the different file extension makes the file name unique, so it does not attempt to overwrite the earlier version. However, some file formats may be for export only, and your software may not be able to read those back in. StuRat ( talk) 18:16, 1 March 2017 (UTC) reply
  • This depends on what you mean by "retrievable". Do you mean what file formats can be recovered from a damaged drive? Well, all of them, in theory. In practice, the format of the file doesn't matter, so long as the header (and sometimes, footer) are undamaged. If you're worried about compatibility (ie., what file format will Joe-Bob my coworker be able to retrieve and use?), then that depends on the software at the other end. If you're concerned about being able to access the data in it after losing access to the software that created it (MS Office it seems), then the newer formats will be better. The newer -x appended formats (.docx, .xlsx) are just .zip files with the data stored in .xml files and the metadata stored in different plaintext formats. So it's fairly trivia to retrieve data from them, even without the original office software. They're also widely supported because they're an open format, so it's trivial to find free software to edit them with. ᛗᛁᛟᛚᚾᛁᚱPants Tell me all about it. 20:31, 1 March 2017 (UTC) reply
It's really unclear what you mean. On the face of it, the answer is straightforward: whatever the last version was you saved. However, you've specified file extensions that are now outdated; newer versions of Word and Excel would have .docx and .xlsx as their defaults. If you open an .xls file into a new version of Excel, edit it, and try to save, you may get a warning about ' minor fidelity loss' because the changes you made aren't completely compatible with the old format. In my experience, those really are very minor items and the supposed loss is negligible. If you open a .xlsx file and then save it as .xls, you will end up with two files (and likewise if you do the same thing with .doc/.docx files). Again, saving as the deprecated file type may give you the warning about fidelity loss. However, I'm really unsure of what you're after. Some additional info would give you better answers. Matt Deres ( talk) 03:16, 2 March 2017 (UTC) reply

Knowing the size of a webpage without downloading it

Can I somehow request a web server to tell me the size of a webpage? Could I do this from the Mac terminal? I know there are services that do that for you online, but I want some solution without any third party (that is, me as client and the server). -- Llaanngg ( talk) 19:52, 1 March 2017 (UTC) reply

You send an HTTP HEAD request. The Content-Length header contains the size of the resource. However, this only tells you the size of whatever single resource you requested. If that's all you want, great. But, if you care about things like images and stylesheets included on the page, you'll need to get the document body and fetch said images, etc. In that case you'll want to use some Web crawling library like cURL to automate things. -- 47.138.163.230 ( talk) 20:00, 1 March 2017 (UTC) reply
Images and multimedia are the problem. When I'm downloading something, how does the browser know the size of a file? Couldn't a program just start downloading these elements and report the size of it? -- Llaanngg ( talk) 20:03, 1 March 2017 (UTC) reply
I just told you how the browser knows the size of a file. It fetches the headers and looks at the Content-Length header. I'm sensing a possible XY problem. Are you trying to accomplish something, and if so, what is it? -- 47.138.163.230 ( talk) 20:08, 1 March 2017 (UTC) reply
Content-Length is not a required field. That is why some downloads can't show a percent downloaded. 2600:1004:B02D:68F0:E5CD:45AB:4FBA:6ACD ( talk) 23:25, 1 March 2017 (UTC) reply
Some browsers have options to disallow certain elements from loading, like pics and vids. Maybe yours can do this, and then you wouldn't need to worry about the size so much. StuRat ( talk) 22:09, 1 March 2017 (UTC) reply
Like someone else said, your question isn't entirely clear due to the xy-problem.
If your question is how to query a server about the size of a page including css, pics, vids, ads, javascripts, etc., the answer is that it can't be done, because:
  • the http protocol doesn't define a method to so (HEAD only gives the size of the primary resource, and only if you're lucky)
  • the server can't know the answer, because:
    • some of the secondary resources may be on another server
    • loading of secondary resources may be dependent on the execution of client-side scripts embedded in html, or even in one of the secondary resources
    • the size of resources may depend on server-side scripting (like php).
Not sure if this is the answer you need. :) Jahoe ( talk) 14:59, 2 March 2017 (UTC) reply
Also, local cache of resources may make the actual transfer smaller, depending on what needs to be fetched remotely vs. locally. -- Jayron 32 16:05, 2 March 2017 (UTC) reply
And on the side line: the percieved speed of a webpage showing up often depends more on javascript execution time than on download time. Jahoe ( talk) 16:24, 2 March 2017 (UTC) reply
Agreed. Specifically, if they could load the page completely, in order, from top down, and allow scrolling while it loads, this would avoid the prob or it being unreadable while loading, assuming you want to read it from the top down. Also, if it could load in background, and tell me when ready, then I could do something else while waiting. They don't seem to have thought these concepts through. StuRat ( talk) 16:45, 2 March 2017 (UTC) reply
My experience is that ads take forever to load. When I see that some of the page is visible, I click the stop button. It stops trying to load ads, trackers, social crap, etc... and I see the content just fine. 209.149.113.5 ( talk) 17:09, 2 March 2017 (UTC) reply
Now seriously, can I request a web server to tell me the size of a webpage? Could I do this from the Mac terminal? Llaanngg ( talk) 17:51, 2 March 2017 (UTC) reply
An HTTP server is not obligated to provide the size of the webpage: the HTTP standard allows it to leave that field out. If the server populates the field, you can use a command like:
curl --head 
/info/en/?search=Wikipedia:Reference_desk/Computing | grep Content-Length
...and you can parse that output further using your favorite tools. If you see no output, ... the server has opted not to provide a Content-Length; and almost definitionally, you can't force somebody else's server to do what you want it to do.
Here are some further references: RFC2616, the HTTP 1.1 standard; and §4.4 Message Length. "Section 4.4 describes how to determine the length of a message-body if a Content-Length is not given."
Pay close attention to the warnings at the top - the standardization of the HTTP 1.1 protocol given by RFC2616 has been superseded by multiple newer documents, RFCs 7230-7237. To deduce a web page's "size," conformant to current technical standards and in light of the numerous modern technologies that make up the "web" as seen by regular users, you need to pore over a lot of complicated documentation and specify your terms a lot more precisely than before.
Nimur ( talk) 18:04, 2 March 2017 (UTC) reply
You repeat the same question Llaanngg, without further clarifying it. If you need more information than be more specific:
  • explain how you define size of a webpage
  • explain what you want to achieve
When your question gets more specific, you can expect more specific answers.  :) Jahoe ( talk) 18:55, 2 March 2017 (UTC) reply
Well, this time I got an appropriate answer. Llaanngg ( talk) 19:22, 2 March 2017 (UTC) reply
Good. Tagging this question as resolved. Jahoe ( talk) 20:53, 2 March 2017 (UTC) reply
Although the answer you were given this time is basically the same as the answer last time albeit with reference to different resources and with a specific command line rather than simply telling you the tool you could use so it's still not clear why you remained confused. For example, if you can't figure out how to actually use a suggested tool, then you should say so and people will find it far easier to help. Nil Einne ( talk) 05:25, 4 March 2017 (UTC) reply
Resolved
From Wikipedia, the free encyclopedia
Computing desk
< February 28 << Feb | March | Apr >> March 2 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


March 1 Information

Data issue

After I re-save in a .doc, .xls and so on, in a HDD, which version could be retrievable? 103.67.159.178 ( talk) 18:13, 1 March 2017 (UTC) reply

Normally when you resave a file in a different format, that is in addition to the old format. That is, the different file extension makes the file name unique, so it does not attempt to overwrite the earlier version. However, some file formats may be for export only, and your software may not be able to read those back in. StuRat ( talk) 18:16, 1 March 2017 (UTC) reply
  • This depends on what you mean by "retrievable". Do you mean what file formats can be recovered from a damaged drive? Well, all of them, in theory. In practice, the format of the file doesn't matter, so long as the header (and sometimes, footer) are undamaged. If you're worried about compatibility (ie., what file format will Joe-Bob my coworker be able to retrieve and use?), then that depends on the software at the other end. If you're concerned about being able to access the data in it after losing access to the software that created it (MS Office it seems), then the newer formats will be better. The newer -x appended formats (.docx, .xlsx) are just .zip files with the data stored in .xml files and the metadata stored in different plaintext formats. So it's fairly trivia to retrieve data from them, even without the original office software. They're also widely supported because they're an open format, so it's trivial to find free software to edit them with. ᛗᛁᛟᛚᚾᛁᚱPants Tell me all about it. 20:31, 1 March 2017 (UTC) reply
It's really unclear what you mean. On the face of it, the answer is straightforward: whatever the last version was you saved. However, you've specified file extensions that are now outdated; newer versions of Word and Excel would have .docx and .xlsx as their defaults. If you open an .xls file into a new version of Excel, edit it, and try to save, you may get a warning about ' minor fidelity loss' because the changes you made aren't completely compatible with the old format. In my experience, those really are very minor items and the supposed loss is negligible. If you open a .xlsx file and then save it as .xls, you will end up with two files (and likewise if you do the same thing with .doc/.docx files). Again, saving as the deprecated file type may give you the warning about fidelity loss. However, I'm really unsure of what you're after. Some additional info would give you better answers. Matt Deres ( talk) 03:16, 2 March 2017 (UTC) reply

Knowing the size of a webpage without downloading it

Can I somehow request a web server to tell me the size of a webpage? Could I do this from the Mac terminal? I know there are services that do that for you online, but I want some solution without any third party (that is, me as client and the server). -- Llaanngg ( talk) 19:52, 1 March 2017 (UTC) reply

You send an HTTP HEAD request. The Content-Length header contains the size of the resource. However, this only tells you the size of whatever single resource you requested. If that's all you want, great. But, if you care about things like images and stylesheets included on the page, you'll need to get the document body and fetch said images, etc. In that case you'll want to use some Web crawling library like cURL to automate things. -- 47.138.163.230 ( talk) 20:00, 1 March 2017 (UTC) reply
Images and multimedia are the problem. When I'm downloading something, how does the browser know the size of a file? Couldn't a program just start downloading these elements and report the size of it? -- Llaanngg ( talk) 20:03, 1 March 2017 (UTC) reply
I just told you how the browser knows the size of a file. It fetches the headers and looks at the Content-Length header. I'm sensing a possible XY problem. Are you trying to accomplish something, and if so, what is it? -- 47.138.163.230 ( talk) 20:08, 1 March 2017 (UTC) reply
Content-Length is not a required field. That is why some downloads can't show a percent downloaded. 2600:1004:B02D:68F0:E5CD:45AB:4FBA:6ACD ( talk) 23:25, 1 March 2017 (UTC) reply
Some browsers have options to disallow certain elements from loading, like pics and vids. Maybe yours can do this, and then you wouldn't need to worry about the size so much. StuRat ( talk) 22:09, 1 March 2017 (UTC) reply
Like someone else said, your question isn't entirely clear due to the xy-problem.
If your question is how to query a server about the size of a page including css, pics, vids, ads, javascripts, etc., the answer is that it can't be done, because:
  • the http protocol doesn't define a method to so (HEAD only gives the size of the primary resource, and only if you're lucky)
  • the server can't know the answer, because:
    • some of the secondary resources may be on another server
    • loading of secondary resources may be dependent on the execution of client-side scripts embedded in html, or even in one of the secondary resources
    • the size of resources may depend on server-side scripting (like php).
Not sure if this is the answer you need. :) Jahoe ( talk) 14:59, 2 March 2017 (UTC) reply
Also, local cache of resources may make the actual transfer smaller, depending on what needs to be fetched remotely vs. locally. -- Jayron 32 16:05, 2 March 2017 (UTC) reply
And on the side line: the percieved speed of a webpage showing up often depends more on javascript execution time than on download time. Jahoe ( talk) 16:24, 2 March 2017 (UTC) reply
Agreed. Specifically, if they could load the page completely, in order, from top down, and allow scrolling while it loads, this would avoid the prob or it being unreadable while loading, assuming you want to read it from the top down. Also, if it could load in background, and tell me when ready, then I could do something else while waiting. They don't seem to have thought these concepts through. StuRat ( talk) 16:45, 2 March 2017 (UTC) reply
My experience is that ads take forever to load. When I see that some of the page is visible, I click the stop button. It stops trying to load ads, trackers, social crap, etc... and I see the content just fine. 209.149.113.5 ( talk) 17:09, 2 March 2017 (UTC) reply
Now seriously, can I request a web server to tell me the size of a webpage? Could I do this from the Mac terminal? Llaanngg ( talk) 17:51, 2 March 2017 (UTC) reply
An HTTP server is not obligated to provide the size of the webpage: the HTTP standard allows it to leave that field out. If the server populates the field, you can use a command like:
curl --head 
/info/en/?search=Wikipedia:Reference_desk/Computing | grep Content-Length
...and you can parse that output further using your favorite tools. If you see no output, ... the server has opted not to provide a Content-Length; and almost definitionally, you can't force somebody else's server to do what you want it to do.
Here are some further references: RFC2616, the HTTP 1.1 standard; and §4.4 Message Length. "Section 4.4 describes how to determine the length of a message-body if a Content-Length is not given."
Pay close attention to the warnings at the top - the standardization of the HTTP 1.1 protocol given by RFC2616 has been superseded by multiple newer documents, RFCs 7230-7237. To deduce a web page's "size," conformant to current technical standards and in light of the numerous modern technologies that make up the "web" as seen by regular users, you need to pore over a lot of complicated documentation and specify your terms a lot more precisely than before.
Nimur ( talk) 18:04, 2 March 2017 (UTC) reply
You repeat the same question Llaanngg, without further clarifying it. If you need more information than be more specific:
  • explain how you define size of a webpage
  • explain what you want to achieve
When your question gets more specific, you can expect more specific answers.  :) Jahoe ( talk) 18:55, 2 March 2017 (UTC) reply
Well, this time I got an appropriate answer. Llaanngg ( talk) 19:22, 2 March 2017 (UTC) reply
Good. Tagging this question as resolved. Jahoe ( talk) 20:53, 2 March 2017 (UTC) reply
Although the answer you were given this time is basically the same as the answer last time albeit with reference to different resources and with a specific command line rather than simply telling you the tool you could use so it's still not clear why you remained confused. For example, if you can't figure out how to actually use a suggested tool, then you should say so and people will find it far easier to help. Nil Einne ( talk) 05:25, 4 March 2017 (UTC) reply
Resolved

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook