Converting CSS selectors to XPath selectors with C#: css2xpath Reloaded

Almost three years ago I released css2xpath#, a port of Andrea Giammarchi’s project by the same name. Today, I’m excited to announce a new project, css2xpath Reloaded, which will supplant my previous project. css2xpath Reloaded does the same thing that css2xpath# did (convert CSS selectors to XPath selectors), but is instead based off of Ian Bicking’s excellent Python app (also somewhat confusingly named css2xpath).

Whereas css2xpath# (and Andrea’s original project) relied on regular expressions to perform the conversion, css2xpath Reloaded recreates Ian’s CSS selector parser.

Although more complicated, a benefit of using a parser is that your CSS selectors can be validated during conversion. It’s also a little bit easier to extend the code. And most importantly (for me at least) it was a lot of fun to write.

I’d consider the code to be beta quality at the moment; it’s lacking in documentation and rigorous testing. However, if you’re interested in testing it out, proceed to BitBucket.

css2xpath Reloaded on BitBucket

VirtualScroller 1.2.1 released

I just released version 1.2.1 of VirtualScroller. This minor update adds two enhancements:

  • Any number of items (at least 1) is supported. If less than 6 items are specified, the VirtualScroller falls back on a standard ScrollableView with the fancy scroll logic disabled. This is transparent to developers and users. Resolves issue 3.
  • VirtualScroller automatically detects whether you want finite or infinite scrolling. If the itemCount property is omitted, infinite scrolling is assumed. Otherwise finite scrolling is used. As such, the infinite property is no longer used. Resolves issue 4.

Download the latest version
Visit the Wiki


A basic “man-in-the-middle” proxy with Twisted

[OUTDATED – It has been a while since I looked at this, so it’s probably very outdated. Please check the comments.]

I came across a nice example of a Twisted “man-in-the-middle” style proxy on Stack Overflow. This style of proxy is great for logging traffic between two endpoints, as well as modifying the requests and responses that travel between them.

The original was posted here, and I reproduce the vast majority of the code below with some modifications. My real motivation for posting this is to “get the code out there”, because I had a hard time finding it originally. A big thanks to the original author for posting his code on Stack Overflow.

All you need to do is change the three constants at the top, and add whatever validation/modification logic you want in the dataReceived and write methods. Those four methods are labeled so you know which “hop” the data is taking. A request is going to take the following path: client => proxy => server => proxy => client. For example, the first dataReceived method handles data travelling from the client to your proxy.

#!/usr/bin/env python

SERVER_ADDR = "server address"

from twisted.internet import protocol, reactor

# Adapted from
class ServerProtocol(protocol.Protocol):
    def __init__(self):
        self.buffer = None
        self.client = None

    def connectionMade(self):
        factory = protocol.ClientFactory()
        factory.protocol = ClientProtocol
        factory.server = self

        reactor.connectTCP(SERVER_ADDR, SERVER_PORT, factory)

    # Client => Proxy
    def dataReceived(self, data):
        if self.client:
            self.buffer = data

    # Proxy => Client
    def write(self, data):

class ClientProtocol(protocol.Protocol):
    def connectionMade(self):
        self.factory.server.client = self
        self.factory.server.buffer = ''

    # Server => Proxy
    def dataReceived(self, data):

    # Proxy => Server
    def write(self, data):
        if data:

def main():
    factory = protocol.ServerFactory()
    factory.protocol = ServerProtocol

    reactor.listenTCP(LISTEN_PORT, factory)

if __name__ == '__main__':

VirtualScroller 1.2 released

It’s been almost a year (exactly a year in seven days), but I finally have a new version of VirtualScroller! Version 1.2 is versioned as a minor update (in the 1.x family), but it contains some significant bug fixes and stability improvements. With the exception of a change in default options (see below), this is a drop-in replacement for version 1.1.1 and 1.1.

Glitch free, smooth scrolling

Previously, scrolling through a VirtualScroller too fast could leave the control in a transient state. Before version 1.2, I had worked around this issue by essentially limiting the scrolling speed. This resulted in an annoying user experience, because it was impossible to quickly swipe through pages. In fact, it was only possible to scroll through one page at a time, with a short pause in between pages.

This is no longer the case with 1.2. Users should not notice any jittering as they swipe along, thanks to vastly improved scrolling logic and event handling.

Two important internal changes made this possible:

  1. Increased view cache: Previously, only three views were maintained in memory which meant that the active view was padded on both sides by only a single view. Because of this, it was possible to scroll to the end of the in-memory views before the VirtualScroller had loaded the next views. Version 1.2 works around this problem by using a cache size of five. This makes it harder to outpace the VirtualScroller’s caching. Note: This means that your VirtualScroller itemCount MUST be at least five (or infinite), or else the VirtualScroller constructor will return null.
  2. Simplified scrolling logic: The code, in general, has been heavily refactored and simplified. For example, VirtualScroller now has a much easier check to determine if a scrollEnd event actually resulted in a page advance. As a consequence, there is much less that can go wrong.

Touch support is assumed

A minor change: previously, the touch option defaulted to false. Since all Google Play apps require touch support, I realize it is more convenient to default it to true instead. In version 1.2, touch now defaults to true.

Updated documentation

The (previously neglected) Wiki has been updated to reflect the new changes. Also, the code example should actually work now.

Download and documentation

Download the latest release here, and get the documentation here. As usual, if you encounter any issues or have any suggestions, please report them here.

Future development

I’m in the process of adding methods for advancing the control forwards and backwards, but I don’t have estimates for when that will be done.


Solving the Timeout=2 error when updating WiFly firmware

While attempting to update the firmware of my WiFly module, I was getting a Timeout=2 error from the ftp update command, even after I had set the new update server (with the help of these instructions). The solution I found was to change the FTP mode from passive to active, and then attempt the update process.

The complete steps I followed, immediately after booting up the module:

  1. $$$   (to get into command mode)
  2. factory RESET
  3. reboot
  4. [ connect to your network – you will need an internet connection to perform the update ]
  5. set ftp address   (the new update server address)
  6. set ftp mode 1
  7. save
  8. ftp update
  9. factory RESET
  10. reboot
  11. When the module reboots, you should see <4.00> at the start of every terminal line.

It is possible that this may work for you without performing step 6. This probably depends on your network and firewall setup. Give a try first without step 6, and then with step 6 if it doesn’t work the first time.

Store now open!

Since 2009 I have sold my photography and designs on my Zazzle store. Recently, I decided to rebrand the store and somewhat integrate it with my blog. If you are interested, please check it out either by using the new “Store” tab above or by clicking here.

Much of what I sell comes from my interest in nature photography and also martial arts. I’ll spare you a Flash widget, and instead just include a few photos below:

“The Krav Equation” Sports Shirt

Birds 2014 Calendar
Birds 2014 Calendar

BSOD mousepad
BSOD mousepad

Black-chinned hummingbird mousepad
Black-chinned hummingbird mousepad

Queen butterfly mousepad
Queen butterfly mousepad

I appreciate the support, even if you only want to take a quick peek.

How to get Amazon Instant Video working in Linux after the recent Flash update

As many have noticed, Flash on Linux was recently upgraded to version This update breaks the ability to use Amazon Instant Video.

The best solution I found is at AskUbuntu. However, there were a few steps missing (such as purging the plugin cache), so I made an edit to add those steps. So, if you are having problems with the new Flash update, check out that link.

Parsing HTML with C++

I was having a hard time finding an HTML parser for my latest C++ project, so I decided to write up a quick summary of what I ended up using.

Revisited! Please see the new article here.

My #1 requirement for a parser was that it had to provide some mechanism of searching for elements. There are a couple of parsers available that only provide SAX-style parsing, which is very inconvenient for all but the simplest of parsing tasks. An ideal API would provide searching using XPath expressions, or something similar.

The only decent sources of information I found were these three questions from Stack Overflow: Library Recommendation: C++ HTML ParserParse html using C, and XML Parser for C. Below is a summary of what I considered along with my take on each:

  • QWebElement – Part of the Qt framework. Although it provides a rich API, I couldn’t figure out how to compile any Qt code outside of Qt Creator (I’m using Code::Blocks.)
  • htmlcxx – Standalone, tiny library. I got some code up and running with this library very fast. However, I quickly realized how limited it is (e.g. poor attribute accessors, no way to search for elements.) Limited documentation.
  • Tidy – The classic HTML cleaner/repairer has a built-in SAX-style parser. Simple to use, but like htmlcxx, limited in what it can do.
  • Tidy + libxml++ – Tidy can transform HTML into XML, so all that’s needed is a good XML parser. This was the solution I ended up using.

My final solution was to use Tidy to clean up the markup and convert it into XML. Then, I use libxml++ (a C++ wrapper for libxml) to traverse the DOM. libxml++ supports searching for elements with XPath, so I was happy.

Here’s some sample code demonstrating Tidy and libxml++.

Step 1: Using Tidy to clean HTML and convert it to XML:

#include <tidy/tidy.h>
#include <tidy/buffio.h>

std::string CleanHTML(const std::string &html){
    // Initialize a Tidy document
    TidyDoc tidyDoc = tidyCreate();
    TidyBuffer tidyOutputBuffer = {0};

    // Configure Tidy
    // The flags tell Tidy to output XML and disable showing warnings
    bool configSuccess = tidyOptSetBool(tidyDoc, TidyXmlOut, yes)
        && tidyOptSetBool(tidyDoc, TidyQuiet, yes)
        && tidyOptSetBool(tidyDoc, TidyNumEntities, yes)
        && tidyOptSetBool(tidyDoc, TidyShowWarnings, no);

    int tidyResponseCode = -1;

    // Parse input
    if (configSuccess)
        tidyResponseCode = tidyParseString(tidyDoc, html.c_str());

    // Process HTML
    if (tidyResponseCode >= 0)
        tidyResponseCode = tidyCleanAndRepair(tidyDoc);

    // Output the HTML to our buffer
    if (tidyResponseCode >= 0)
        tidyResponseCode = tidySaveBuffer(tidyDoc, &tidyOutputBuffer);

    // Any errors from Tidy?
    if (tidyResponseCode < 0)
        throw ("Tidy encountered an error while parsing an HTML response. Tidy response code: " + tidyResponseCode);

    // Grab the result from the buffer and then free Tidy's memory
    std::string tidyResult = (char*)tidyOutputBuffer.bp;

    return tidyResult;


Step 2: Parse the XML with libxml++:
The following code parses the HTML contained in ‘response’ (passing it to CleanHTML first.) Then, we search for the element with id ‘some_id’. After outputting how many elements match that criteria (should be 1), we output the line in the XML at which the element occurs. For the sake of saving space I omit error checking.

#include <libxml++/libxml++.h>

xmlpp::DomParser doc;

// 'response' contains your HTML

xmlpp::Document* document = doc.get_document();
xmlpp::Element* root = document->get_root_node();

xmlpp::NodeSet elemns = root->find("descendant-or-self::*[@id = 'some_id']");
std::cout << elemns[0]->get_line() << std::endl;
std::cout << elemns.size() << std::endl;

Important note about namespaces
Something that took me a while to figure out is that libxml++ requires the full namespace when selecting tags.

This won’t work: *p/text()
But this will: *[local-name() = 'p']/text()

You could also specify the full namespace manually, but I find that local-name() is a much better option.

Tidy by default doesn’t seem to support HTML5. For a version of Tidy that does, see here:

More info

To compile the example code, I use the g++ flags: `pkg-config --cflags glibmm-2.4 libxml++-2.6 --libs` -ltidy. As the flags suggest, you’ll need the glibmm library in addition to Tidy and libxml++ (and their dependencies.)

See the libxml++ class references:

Revisited! Please see the new article here.

Happy belated new year!

Happy belated new year! Wow, I’ve been slacking on this blog. Anyway, I wanted to give a quick update on what I’ve been doing over the past month and what I’ll be doing in 2013.

So long Windows!

I’ve switched to Linux! Linux Mint, to be exact. I made the switch during my Christmas break, and after a few weeks of getting adjusted, I don’t think I can ever switch back to Windows again. While I still have a small Windows partition for games (Battlefield 3, anyone?), my day-to-day activities are now carried out exclusively on Linux. There were a few factors that caused me to switch:

  • Python development (which I feel myself gravitating more towards) sucks on Windows
  • I was tired of reinstalling Windows every few years
  • Linux is more secure (of course)

What does this mean for my blog/projects? Well, I think I’ll be spending less time with C#, unfortunately. I’ve installed MonoDevelop and Mono, which I’m looking forward to working with, but the draw of Python may end up being too great. I don’t think I’ll ever give up C#, however.

Less C#, more Python

As I said, I’ll be spending more time with Python. I’m hoping to come up with some good Python posts.

Raspberry Pi!

Over break I acquired my first Raspberry Pi (model B). Currently, its sole purpose is a No-IP client, but I have big plans for it. Sensor gateway, anyone? More on that some other time.


I’ve been using C++ on a very basic level for the last year or two (with Arduino and a few other projects,) but this year I’ll be investigating it much more.


And finally, I have a few improvements in mind for VirtualScroller, my Titanium module for implementing memory-efficient infinite (and finite) scrolling. In addition, I’ll probably be playing around with some networking stuff on Titanium.

Well, that’s what I’ve got lined up for now, although I’m sure I’ll get side-tracked during the course of the year and end up working on completely different stuff in addition to what I’ve listed. Either way, 2013 should be fun.

BaseMemberPromotingProxy binary posted on BitBucket

I’ve posted BaseMemberPromotingProxy.dll to the downloads section of the BaseMemberPromotingProxy BitBucket. Now you don’t have to compile the source yourself to use it – just grab the DLL and you’re all set!

Also, to anyone that has been compiling (or modifying) the source, note that I made a few commits that correct a stupid spelling mistake. In some places, BaseMemberPromotingProxy was incorrectly spelled BaseMemberyPromotingProxy.

Enjoy, and remember:

There is no “y” in “member” – Me