Google-chrome – How to convert a webpage to PDF with preserving its look (exactly as on web browser) and text/links

browserfirefoxgoogle-chromepdfprinting

I'm looking for a way to convert a webpage to PDF, but preserving the webpage's look. Also preserving webpage's text (being selectable), searchable [Generating image screenshot for the webpage would make text neither selectable nor searchable].

I'm looking for printing the webpage to PDF as is (as on web browser) without any manipulation on style or alignment, or loss of any webpage's static components.

This would help preserving offline copies of webpages that are easily readable, annotateable and searchable.


You don't need to read any of below (Question is just the above section) in order to get my question. The following section is just listing of what I've got through research or others' answers in a nested way in order to reach an answer for the question.

Research Outcomes (Suggestions that didn't solve my problem)

Outcomes till now on trying to find a solution (All still not working as a solution for this question)

I've tried these PDF web printing engines but all manipulate pages' look, more even damaging and making some hardly readable: (Example page screenshots are included in square brackets)

  • Chrome [Original, Print Styles (Disabled | not Disabled)]
  • Firefox [Original, Print Styles (Disabled p1,p2 | not Disabled p1,p2)]
  • Readability
    • It simplifies the webpage (which is a good thing for focused reading–However, this isn't what I'm looking for). I'm looking for keeping all the webpage's positions/styles properties as seen on Web Browser in a PDF format without any manipulation.
  • Foxit Reader
  • NovaPDF
  • CutyCapt [Original, Zoom Factor: 0.4: Screenshots, Outputted PDF]
    • I'll add links after I solve program's running issues on Windows"
  • wkhtmltopdf [Original, Zoom Factor: 0.4: Screenshots, Outputted PDF]
    • It doesn't support CSS3.

All webpage screenshot image capturing plugins (e.g. Abduction, Awesome Screenshot, Fireshot, Firefox Screenshot Developer Tool, Full Page Screen Capture, Page2Images, web-capture, …) don't answer my question, because they don't preserve text and links.

Scrible is great at preserving webpages as is for further annotation and research, but unfortunately still online and without conversion to PDF format.

There are two other questions on the community similar somehow to mine, however, this one is different a little bit but with those important distinctions:

More Similar questions where preserving text and links isn't a requirement (pages are captured as image screenshots mostly):


Notes

OS: Windows 10

Best Answer

We faced the same problem in a University project and were able to solve it using

wkhtmltopdf

We quite enjoyed the capabilities of this tool on the command line. We also called it using python code to render the current state of webpages. It has the option to deliver the webpage as pdf, usually not perfect to preserve the website view due to the Page formatting (A4 for example), or as png (preserves the view of the page but not links)

There is also the readability(for Python:pypi.python.org/pypi/readability-lxml) project we used that does the ads removal and content detection quite well (e.g. for newspaper articles and the like). If you just want an addon or extension for your browser the following readability implementation might satisfy your need:

Offline now: https://www.readability.com/addons/

WaybackMachine Link: https://web.archive.org/web/20160308192045/https://readability.com/addons