![]() The more useful approach is to have a top down structure based on your business or personal wants and related tasks. Don't know if it will work as I'm at the same point you're at - maybe in a week will know for sure. Then you can non-destructively symlink from the saved original html to the files_archive and just recreate the file using a php or alternative script if need be. So you need a real CMS you need a unique id based on content (eg index8765432.html) that goes in your files_archive. The goal is to write once and use many times. Create a flat file index permitting 100k or 1000k records. The partial solution is to write you own CMS and use scripts to map all relevant files to a flat file database - then use fileName, size, mtime and md5 to get a unique Id for each file. Dozens/ perhaps hundreds of identical index.html or index.php are cluttering my drives. chopping the file name in order to save it. Just add a '|' and one has trouble doing copy and paste backups to a spare drive. Look at your file name which saved on my box as "What's the best "file format" for saving complete web pages (images, etc.) in a single archive? - Stack Overflow.html" The problem is that html is bottoms up not top down. I haven't tested it much so I cannot say how well it handles "volatile" ajax pages. It stores a web page to one html file with images inlined using already mentioned data URIs. You would need some sort of JS memory dump and load this to get page to desired stateĬheck Chrome SingleFile extension. JS and it is not always possible/easy to recreate it locally. You need to run the code.Įven position of basic html elements may be recomputed may be computed dynamically by Such resource is then not part of stored page.Įven parsing JS code may not discover them. One page is in fact several pages build dynamically by JS, user interaction is neededĪJAX applications can do remote communication with remote service rendering it This is ultimately flawed approach.ĭon't forget web pages there days are rather local applications then a static document you can easily store. ![]() Most current "save page as" functionality in browser, be it to MAF or MHTML or file+dir, attempts the first way. Image of some rendered state of web page DOM? To capture page as it was rendered at some point in time a static Store whole page as it is with all referenced resources - images, Another crucial question is what exactly you want to store? Is it: Keeping in mind browser support and ease of editing the page, what do you think's the best way to save web pages in a single archive? What would be best as a "standard"? Or should I just buckle down and deal with the HTML file and separate folder? For the sake of my project, I could support that, but I'd best avoid it. My project needs to have them in a single archive. But having to handle two separate elements is not simple and streamlined at all. Then of course, there's "Save complete webpage" where the HTML markup is saved as "savedpage.html" and the files in a separate "savedpage_files" folder. Firefox, Opera, and Safari support it without gaffes IE, the market leader, only started supporting it at IE8, and even then with limits. Depending on your view, it's streamlined since the files are right where the markup is. ![]() Instead of referencing an external location a la MHTML or MAF, you encode the file straight into the HTML markup as base64. The only extension supporting it wasn't updated since Firefox 1.5.ĭata URIs are becoming more popular. It's an awesome idea - Winamp does this for skins, and ODF and OOXML for their embedded images. Mozilla has Mozilla Archive Format - basically a ZIP file with the markup and images, with metadata saved as RDF. IE and Opera support it Firefox and Safari with a cumbersome extension. Plus, implementations other than IE's are just cumbersome. This is a great idea and it's been around forever, except it's been a "proposed standard" since 1999. It's already based on an existing standard, and MHTML as its own was proposed as rfc2557. Microsoft has MHTML - basically a file encoded exactly as a MIME HTML email message. Which is the best format for HTML archives? I want to save a web page in a single archive, and while there are several solutions, there's no "standard". But complete web pages can't - they're saved as a separate HTML file and data folder. Now, most every project can be saved as one file, like DOC, PPT, and ODF. I'm working on a project which stores single images and text files in one place, like a time capsule.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |