Computing Programming Projects Webpage Content Analyser Work

Webpage Content Analyser v0.1.1

I’ve just updated the webpage content analyser as the version I originally uploaded was a devlopment version and didn’t actually work!

Anyway the new version still needs tweaking but it gives you an idea of the kind of thing it can produce. Syntax of invocation is still the same as is the download link, both of which can be found on the web content analyser page on the right hand menu.

Computing Programming Webpage Content Analyser Work

Webpage Content Analyser

At work we use Dansguardian to provide content based filtering for our users. We found in a few instances Dansguardian’s default phraselists was blocking content that we wanted our users to have access to, so we needed to modify them.

We had a problem though, we didn’t know what words over a set of pages were common so that we could use to modify our current phraselists. So I wrote a tool, WPA, that would take a webpage strip all the formating and give a list of most common phrases and words.

To use WPA do the following:

java -jar WPA.jar site1 site2 site3 …

You can also use it to see what words you should be banning if you have a set of bad pages you don’t want your users to be viewing. Using your blocked domains list is good for this to prevent access to similar content:

cat /etc/dansguardian/lists/global-block/domains | java -jar WPA.jar

All you need to do then is decide what phrases to add to your weighted phrase lists!

Of course the beauty of this program is that it’s cross platform thanks to Java, so you’ll be able to run this on Linux, Windows or Mac. If you want the source code for this then it’s all bundled up in the jar.

To get hold of the jar Click Here.

CAP Work


I’m in the throws of writing a little two part application for work that allows users to see what computers are avaiable for work and allows us, the administrators to see who is and more importantly who was logged on when and where.

The technical details of proposed implementation can be found here.

The client will be written in Java and the server (display) in php.


Cobweb 2

I’m thinking about rewriting (read: recoding) my dissertation project as a sideline programming project.

The final version of Cobweb that I coded for my dissertation was okay, but it didn’t work particularly well. I think with some refinement and some decent HCI analysis for the frontend Cobweb could become a handy service orientated collaboration tool.

If you want to read my original dissertation its avaliable here.

I’m currently trying to decide whether to brave writing the application in C# – which I’ve never used for anything serious or to use Java which I haven’t used in anger for a good year and a half.