NAME WWW::Mechanize::Chrome::Webshot - take a screenshot of rendered HTML or a cheap+cheerful html2pdf converter VERSION Version 0.01 SYNOPSIS This module provides "shoot($params)" which loads a specified URL or local file into a spawned, possibly headless, browser (thank you Corion for WWW::Mechanize::Chrome), waits for some settle time, optionally removes specified DOM elements (e.g. advertisements and conscents), takes a screenshot of the rendered content and saves into the output file, as PDF or PNG, optionally adding any specified EXIF tags. At the same time, this functionality can be seen as a round-about way for converting HTML, complete with CSS and JS, to PDF or PNG. And that is no mean feat. Actually it's a mean hack. Did I say that it supports as much HTML, CSS and JS as the modern browser does? Here are some examples: use WWW::Mechanize::Chrome::Webshot; my $shooter = WWW::Mechanize::Chrome::Webshot->new({ 'settle-time' => 10, }); $shooter->shoot({ 'output-filename' => 'abc.png', 'output-format' => 'png', # optional if it can not be deduced # or 'file:///A/B/C.html' # << use absolute filepath! 'url' => 'https://www.902.gr', 'remove-DOM-elements' => [ {'element-xpathselector' => '//div[id="advertisments"]'}, {...} ], 'exif' => [{'created' => 'by the shooter'}, ...], }); ... CONSTRUCTOR new($params) Creates a new Android::ElectricSheep::Automator object. $params is a hash reference used to pass initialization options which may or should include the following: confighash or configfile or configstring Optional, default will be used. The configuration file/hash/string holds configuration parameters and its format is "enhanced" JSON (see "use Config::JSON::Enhanced") which is basically JSON which allows comments between . Here is an example configuration file to get you started, this configuration is used as default when none is provided: and <% verbatim sections %> */> { "debug" : { "verbosity" : 1, "cleanup" : 1 }, "logger" : { }, "constructor" : { "settle-time" : "3", "resolution" : "1600x1200", "stop-on-error" : "0", "remove-dom-elements" : [] }, "WWW::Mechanize::Chrome" : { "headless" : "1", "launch_arg" : [ "--window-size=600x800", "--password-store=basic", "--disable-gpu", "--ignore-certificate-errors", "--disable-background-networking", "--disable-client-side-phishing-detection", "--disable-component-update", "--disable-hang-monitor", "--disable-save-password-bubble", "--disable-default-apps", "--disable-infobars", "--disable-popup-blocking" ] } } All sections in the configuration are mandatory. confighash is a hash of configuration options with structure as above and can be supplied to the constructor instead of the configuration file. If no configuration is specified, then a default configuration will be used. This is hardcoded in the source code. logger or logfile Optional. Specify a logger object which adheres to Mojo::Log's API or a logfile to write log info into. It must implement methods info(), error(), warn(). verbosity Optional. Verbosity level as an integer, default is 0, silent. cleanup Optional. Cleanup all temporary files after exit. Default is 1 (yes). It is useful when debugging. settle-time Optional. Seconds to wait between loading the specified URL and taking the screenshot. This is very important if target URL has lots to do or on a slow connection. Default is 2 seconds. resolution Optional. The size of the browser in WxH format. Default is 1600x1200. headless Optional. When debugging you may find it useful to display the browser while it loads the URL. Set this to 1 if you want this. Default is 0 (yes, headless). Make sure you specify a huge settle-time with this because the browser will shutdown as soon as the screenshot is taken. remove-dom-elements Optional. After the URL is loaded and settle time has passed, DOM elements can be removed. Annoyances like advertisements, consents, warnings can be zapped by specifying their XPath selectors. This is an ARRAY_REF of HASH_REF. Each HASH_REF is a selector for DOM elements to be zapped. See https://metacpan.org/pod/WWW::Mechanize::Chrome::DOMops#ELEMENT-SELECTORS on the exact spec of the DOM selectors. WWW::Mechanize::Chrome Optional. Specify any parameters to be passed on to the constructor of WWW::Mechanize::Chrome. METHODS shoot($params) It takes a screenshot of the specified URL as rendered by WWW::Mechanize::Chrome (usually headless) and saves it as an image to the specified file. Input parameters $params: url specifies the target URL or URI pointing to a local file (e.g. file:///A/B/C.html, use absolute filepath). remove-dom-elements specifies DOM elements to be removed after the URL has been loaded and settle time has passed. Annoyances like advertisements, consents, warnings can be zapped by specifying their XPath selectors. This is an ARRAY_REF of HASH_REF. Each HASH_REF is a selector for DOM elements to be zapped. See https://metacpan.org/pod/WWW::Mechanize::Chrome::DOMops#ELEMENT-SELECTORS on the exact spec of the DOM selectors. Note that a parameter with the same name can be specified in the constructor. If one is specified here, then the one specified in the constructor will be ignored, else, it will be used. exif optionally specify one or more EXIF tags to be inserted into the output image. If one is specified here, then any specified in the constructor will be ignored. shutdown() It shutdowns the current WWW::Mechanize::Chrome object, if any. scroll_to_bottom() It scrolls the browser's contents to the very bottom without changing its horizontal position. scroll($w, $h) It scrolls the browser's screen by $w pixels in the horizontal direction and by $h pixels in the vertical direction. mech_obj() It returns the currently used WWW::Mechanize::Chrome object. SCRIPTS For convenience, the following scripts are provided: script/www-mechanize-webshot.pl It will take a URL, load it, render it, optionally zap any specified DOM elements and save the rendered content into an output image: This will save the screenshot and also adds the specified exif data: script/www-mechanize-webshot.pl --url 'https://www.902.gr' --resolution 2000x2000 --exif 'created' 'bliako' --output-filename '902.png' --settle-time 10 Debug why the output is not what you expect, show the browser and let it live for huge settle time: script/www-mechanize-webshot.pl --no-headless --url 'https://www.902.gr' --resolution 2000x2000 --output-filename '902.png' --settle-time 100000 This will also remove specified DOM elements by tag name and XPath selector. Note that the output format will be deduced as PDF because of the filename: script/www-mechanize-webshot.pl --remove-dom-elements '[{\"element-tag\":\"div\",\"element-id\":\"sickle-and-hammer\",\"&&\":\"1\"},{\"element-xpathselector\":\"//div[id=ads]\"}]' --url 'https://www.902.gr' --resolution 2000x2000 --exif 'created' 'bliako' --output-filename '902.pdf' --settle-time 10 Explicitly save the output as PDF: script/www-mechanize-webshot.pl --url 'https://www.902.gr' --resolution 2000x2000 --exif 'created' 'bliako' --output-filename 'tmpimg' --output-format 'PDF' --settle-time 10 REQUIREMENTS This module requires that the Chrome browser is installed in your computer and can be found by WWW::Mechanize::Chrome. The browser will be run, usually headless -- so a headless desktop is fine, the first time you take a screenshot. It will only be re-spawned if you have shutdown the browser in the meantime. Exiting your script will shutdown the browser. And so, running a script again will re-spawn the browser (AFAIK). AUTHOR Andreas Hadjiprocopis, BUGS Please report any bugs or feature requests to bug-www-mechanize-chrome-webshot at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Mechanize-Chrome-Webshot. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. SUPPORT You can find documentation for this module with the perldoc command. perldoc WWW::Mechanize::Chrome::Webshot You can also look for information at: * RT: CPAN's request tracker (report bugs here) https://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Mechanize-Chrome-Webshot * AnnoCPAN: Annotated CPAN documentation http://annocpan.org/dist/WWW-Mechanize-Chrome-Webshot * CPAN Ratings https://cpanratings.perl.org/d/WWW-Mechanize-Chrome-Webshot * Search CPAN https://metacpan.org/release/WWW-Mechanize-Chrome-Webshot ACKNOWLEDGEMENTS LICENSE AND COPYRIGHT Copyright 2019 Andreas Hadjiprocopis. This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at: http://www.perlfoundation.org/artistic_license_2_0 Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license. If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license. This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder. This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed. Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.