From Wikipedia, the free encyclopedia

Please note that questions about the database download are more likely to be answered on the xmldatadumps-l or wikitech-l mailing lists than on this talk page.

Removing inline external links

@ Philoserf: I note that your recent edits removed many of the inline external links. This makes the page much harder to use: statements such as "If it doesn't work, see the forums" and "The SQL file used to initialize a MediaWiki database can be found here" aren't useful without their external link. This is technical documentation aimed at technical users, not an encyclopedia article, so I'm not convinced that applying the MOS strictly is a good idea. Are you able to redo your edit without losing the external links? -- John of Reading ( talk) 08:04, 8 February 2023 (UTC) reply

undone —¿philoserf? ( talk) 08:05, 8 February 2023 (UTC) reply

Semi-protected edit request on 1 March 2023

Unfortunately the link to the wiki-as-ebook is no longer available. Link needs to be removed.

E-book The wiki-as-ebook store provides ebooks created from a large set of Wikipedia articles with grayscale images for e-book-readers (2013). Tibor Brink ( talk) 15:44, 1 March 2023 (UTC) reply

 Done -- John of Reading ( talk) 16:34, 1 March 2023 (UTC) reply

Update: this is still listed as an option in the "Offline Wikipedia Reader" list at the top, and that should be removed as well. — Preceding unsigned comment added by Dfhci ( talkcontribs) 14:09, 17 March 2024 (UTC) reply

@ Dfhci: Also  Done -- John of Reading ( talk) 15:45, 17 March 2024 (UTC) reply

Semi-protected edit request on 15 October 2023

this pages states that pages-meta-current.xml.bz2 is over 19GB compressed witch although true is highly misleading due to it now being 36.6GB is size when compressed. 73.249.220.113 ( talk) 02:31, 15 October 2023 (UTC) reply

 Not done The page doesn't say anything about the size of pages-meta-current.xml.bz2. The page mentions 19 GB in the context of a different download, pages-articles-multistream.xml.bz2. The latest dump index says that the 19GB file is now about 22GB. -- John of Reading ( talk) 07:06, 15 October 2023 (UTC) reply

How to use multistream?

The "How to use multistream?" shows

" For multistream, you can get an index file, pages-articles-multistream-index.txt.bz2. The first field of this index is the number of bytes to seek into the compressed archive pages-articles-multistream.xml.bz2, the second is the article ID, the third the article title.

Cut a small part out of the archive with dd using the byte offset as found in the index. You could then either bzip2 decompress it or use bzip2recover, and search the first file for the article ID.

See https://docs.python.org/3/library/bz2.html#bz2.BZ2Decompressor for info about such multistream files and about how to decompress them with python; see also https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dumps/+/ariel/toys/bz2multistream/README.txt and related files for an old working toy. "

I have the index and the multistream, and I can make a live usb flash drive with https://trisquel.info/en/wiki/how-create-liveusb

lsblk

umount /dev/sdX*

sudo dd if=/path/to/image.iso of=/dev/sdX bs=8M;sync

,but I do not know how to use dd that well to

"Cut a small part out of the archive with dd using the byte offset as found in the index." than "You could then either bzip2 decompress it or use bzip2recover, and search the first file for the article ID. "

Is there any video or more information on Wikipedia about how to do this, so I can look at Wikipedia pages, or at least the text off-line?

Thank you for your time.

Other Cody ( talk) 22:46, 4 December 2023 (UTC) reply

https://trisquel.info/en/forum/how-do-you-cut-wikipedia-database-dump-dd
has someone called Magic Banana who has information about how to do this.
Maybe others as well. Other Cody ( talk) 15:44, 26 January 2024 (UTC) reply
From Wikipedia, the free encyclopedia

Please note that questions about the database download are more likely to be answered on the xmldatadumps-l or wikitech-l mailing lists than on this talk page.

Removing inline external links

@ Philoserf: I note that your recent edits removed many of the inline external links. This makes the page much harder to use: statements such as "If it doesn't work, see the forums" and "The SQL file used to initialize a MediaWiki database can be found here" aren't useful without their external link. This is technical documentation aimed at technical users, not an encyclopedia article, so I'm not convinced that applying the MOS strictly is a good idea. Are you able to redo your edit without losing the external links? -- John of Reading ( talk) 08:04, 8 February 2023 (UTC) reply

undone —¿philoserf? ( talk) 08:05, 8 February 2023 (UTC) reply

Semi-protected edit request on 1 March 2023

Unfortunately the link to the wiki-as-ebook is no longer available. Link needs to be removed.

E-book The wiki-as-ebook store provides ebooks created from a large set of Wikipedia articles with grayscale images for e-book-readers (2013). Tibor Brink ( talk) 15:44, 1 March 2023 (UTC) reply

 Done -- John of Reading ( talk) 16:34, 1 March 2023 (UTC) reply

Update: this is still listed as an option in the "Offline Wikipedia Reader" list at the top, and that should be removed as well. — Preceding unsigned comment added by Dfhci ( talkcontribs) 14:09, 17 March 2024 (UTC) reply

@ Dfhci: Also  Done -- John of Reading ( talk) 15:45, 17 March 2024 (UTC) reply

Semi-protected edit request on 15 October 2023

this pages states that pages-meta-current.xml.bz2 is over 19GB compressed witch although true is highly misleading due to it now being 36.6GB is size when compressed. 73.249.220.113 ( talk) 02:31, 15 October 2023 (UTC) reply

 Not done The page doesn't say anything about the size of pages-meta-current.xml.bz2. The page mentions 19 GB in the context of a different download, pages-articles-multistream.xml.bz2. The latest dump index says that the 19GB file is now about 22GB. -- John of Reading ( talk) 07:06, 15 October 2023 (UTC) reply

How to use multistream?

The "How to use multistream?" shows

" For multistream, you can get an index file, pages-articles-multistream-index.txt.bz2. The first field of this index is the number of bytes to seek into the compressed archive pages-articles-multistream.xml.bz2, the second is the article ID, the third the article title.

Cut a small part out of the archive with dd using the byte offset as found in the index. You could then either bzip2 decompress it or use bzip2recover, and search the first file for the article ID.

See https://docs.python.org/3/library/bz2.html#bz2.BZ2Decompressor for info about such multistream files and about how to decompress them with python; see also https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dumps/+/ariel/toys/bz2multistream/README.txt and related files for an old working toy. "

I have the index and the multistream, and I can make a live usb flash drive with https://trisquel.info/en/wiki/how-create-liveusb

lsblk

umount /dev/sdX*

sudo dd if=/path/to/image.iso of=/dev/sdX bs=8M;sync

,but I do not know how to use dd that well to

"Cut a small part out of the archive with dd using the byte offset as found in the index." than "You could then either bzip2 decompress it or use bzip2recover, and search the first file for the article ID. "

Is there any video or more information on Wikipedia about how to do this, so I can look at Wikipedia pages, or at least the text off-line?

Thank you for your time.

Other Cody ( talk) 22:46, 4 December 2023 (UTC) reply

https://trisquel.info/en/forum/how-do-you-cut-wikipedia-database-dump-dd
has someone called Magic Banana who has information about how to do this.
Maybe others as well. Other Cody ( talk) 15:44, 26 January 2024 (UTC) reply

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook