Here is a summary of questions that popped here and there over the past couple of months:
Where can one download a pre-2022 (meaning pre-AI slop) of the English Wikipedia?
Wikipedia is probably still very much slop-free thanks to its myriad editors and a policy to shoot on sight if AI is suspected, but there is documented (attempted) use. The answer is via torrent, as people are still seeding older versions: here is December 2021 (93GB), May 2022 (pre-ChatGPT release; 95 GB). Archive.org also has several ZIM files hosted.
How to manage large downloads (several distinct questions boiling down to this)
The simplest, surest way is to use torrents, as this is a protocol that is meant to 1. download large files and 2. handle patchy connectivity, as it will resume exactly where it stopped if your connection is interrupted. We recommend using Transmission as a webseed client, but that’s largely a matter of personal taste.
Can we get a ZIM file of all Wikipedias together?
The idea here is presumably to be able to go straight from a language entry to the corresponding entry in another language (quite a few people use it for translations). Unfortunately, interwikis are not available so that’s not possible (yet?).
Can we get high resolution images?
The short answer is no, eventhough some resources like Wikimed and iFixit would be much more practical if they had full-size images. There is a trade-off, however, with file size which can drastically increase when image compression is removed. Seeing as we’re first catering to people with low bandwidth, these would not be practical (but we are thinking of workarounds).
Why would the same ZIM file have different sizes?
Depending where you look the same copy of Wikipedia could be shown to be 111 or 119 GB. Well, devil is in the details and the answer is that the latter is in GB and stands for gigabyte, a measure of digital information based on powers of 10, where 1 GB = 1,000,000,000 bytes. The former is actually in GiB and stands for gibibyte, a unit based on powers of 2, where 1 GiB = 1,073,741,824 bytes. The 7% difference adds up quickly when files are large.
How do I run Kiwix on TrueNAS
There’s an actual blog post for it already, but just in case it is pretty straightforward – no weird settings or anything. Create a dataset for it, using the App type. Point to that dataset for the install. Drop your zim files into the zim directory it creates on install (it won’t run without at least one zim file). Changed the default storage host path to suit your own configuration.
How do I run Zimit on Windows
Zimit is our universal scraper that can capture quite a range of different websites – as opposite to dedicated scrapers, who will capture a specific architecture with a higher fidelity. Bear in mind that this is resource-intensive so you should start with a powerful machine and decent internet connection.
If you’re comfortable with command-line tools and following technical instructions, this is definitely doable. But, if some of these concepts are new to you, there will be a learning curve though there are good tutorials out there for Docker basics. Should you end up stuck on specific steps, take a look at these FAQs.
How do I know if a site can be zimmed up?
Fun fact: nobody knows, including us! Sometime a website will use fancy JS that will send our scrapers screaming abuse, sometime they’re behind a very aggressive Cloudflare protection, etc. etc. And sometimes it works. We can’t know before we try – that is why it is always worth asking!
Can I get an RSS feed of an individual ZIM file?
Short answer is Yes! Long answer is probably just Yes! again: simply go to browse.library.kiwix.org, play with the filters so that only the ZIM file you are looking for appears, and then simply tap the RSS button at the top right of your screen.
Feel free to send us any question that crosses your mind at hello@kiwix.org and we’ll regularly post our answers in an easy-to-find blog post.