Did you know that Netbase users can subscribe to daily or weekly reports for dashboards they are interested in? Our servers generate a number of web page snapshots in PDF format every hour and send them to each subscriber.

It is easy to generate an email in HTML format using a Java-based server and send emails hourly. However, what about generating a snapshot picture on those non-GUI machines?

The Bottleneck

To generate a dashboard snapshot, our service needs to follow these steps:

1.  Load dashboard metadata from the database
2.  Run searches to collect information
3.  Render the web page according to the layout and the data just collected
4.  Get the HTML code of the web page
5.  Generate a PDF file according to the HTML code

Steps 1 and 2 are already covered by our service. Step 5 is handled by an open source software tool, wkhtmltopdf, which we already use to export to PDF. The problem is—how to render a web page automatically without any GUI?

SlimerJS, a Scriptable Browser for Web Developers

We knew we needed a virtual browser, which can navigate the Internet and render web pages in memory. This virtual browser doesn’t need any monitors or to render on the screen.

PhantomJS and SlimerJS are both virtual browsers. They can export the “screenshot” as an image file, a PDF file, or even HTML code. Both of them export good-quality image files, but their PDF results are not as good as the image results.

We decided to generate HTML code via a virtual browser and use wkhtmltopdf to transform the HTML code to PDF files. Although PhantomJS and SlimerJS can both dump HTML code, their results are not quite the same. After comparing the final results for each, we chose SlimerJS as our final solution.

To open a web page using SlimerJS, we wrote a Javascript-like script file, called a JS script, to control SlimerJS. Here is a simple JS script sample (open.js):

var page = require(‘webpage’).create();
page.viewportSize = {width: 650, height: 320};
page.open(‘http://app.netbase.com/’)
.then(function() {
page.render(‘page.png’, {onlyViewport: true});
slimer.exit();
return;
});

Executing slimerjs open.js generates a page.png file, which is the web page snapshot.

Getting HTML Code Using SlimerJS

In our pipeline, we have to dump HTML code for wkhtmltopdf to generate PDF files. Although there is no JS script method to dump HTML directly, SlimerJS supports Javascript, which means we could use the HTML DOM querySelector() method to get the outer HTML code of the specific container. Here is sample JS code:

var page = require(‘webpage’).create();
page.viewportSize = {width: 650, height: 320};
page.open(‘http://app.netbase.com/’).then(function() {
var html = page.evaluate(function () {
return document.querySelector(‘div[class=”login login-header”]’);
})
console.log(html.outerHTML);
slimer.exit();
return;
});

After the console.log() method, the HTML code, which is what wkhtmltopdf needs, is printed.

Parallel ProcessingParallelProcessing

We know that SlimerJS is a non-GUI browser that can dump HTML code. We can use Server A to run only SlimerJS and wkhtmltopdf and use Server B to run searches and collect data. Here is a diagram of the workflow: 

Now It’s Your Turn

I’ve briefly described here what SlimerJS can do. During the development process, we also encountered these difficult problems:

  • How to log in to the web page as a given user
  • How to know when the page is loaded completely
  • What to do if the search result doesn’t come back for a long time

We were able to solve these problems using JS script and a little more Javascript code. We recommend that you try solving these issues to practice using SlimerJS. Now, it’s time for you to code!

We’d love to hear your successes and challenges. Connect with us here or feel free to comment below.

NetBase Product Line

Premier social media analytics platform

Tailored platform for growing businesses

Expand your social platform with LexisNexis news media

Power of social analytics for your entire team

Customer experience analytics platform

AI, Image Analytics, Reporting Tools, APIs & more

Product configurations to meet all needs

Quid Product Line

Media analytics and market intelligence platform

Enrich your media analytics with social data

Media coverage for historical & real-time monitoring

AI algorithms, NLP, data sources, and visualization

Tailored, configurable solutions