Category Archives: Technology

Automatic Translation with Google Language API

When faced with the need to help a multi-lingual community interact better, not intimidate one region or another, and generally facilitate interaction, language can be a huge barrier.

I’ve recently started to investigate what could be done when faced with this challenge as this is a very real problem for me and our http://www.SolutionExchange.info community platform.  Aggregation of user driven content can be a great thing but common publication processes like editing and translation are bypassed.  The availability of tools to aid an individual to publish his or her thoughts and opinions is of course a good thing in the most part as it allows for people to interact more quickly and easily removing the barriers that actually once prevented any kind of sharing or interaction (e.g. you were never able to publically comment on a newspaper article or spread a story without significant effort and cost).

With a wide and varied community, I investigated the use of the Google translate API accessible via the Google AJAX language API to start a trial to see how this automated process can help our users gain some context about content that may not be written in their mother tongue.  What is particularly useful, is that the API can detect the source language automatically, which is great when you have many languages within many sources.

The trial starts on the 6th August 2010 and I would like to run it over the course of a month to see whether this prototype evolves into something valuable for some of our users.  The feature can be seen in the footer of the site http://www.solutionexchange.info and must be invoked manually as no choices are currently remembered.  By design, this was a pro-active choice as I was keen to ensure users pro-actively decided to try out the feature and not become confused by auto-translated content that they had not expected.  Auto-translated content then shows up appended with a green asterix to indicate that the related text has gone through automatic translation.  Currently, Tweets, Solution Descriptions, and Community Feed items are just some of the sections under trial but this can be easily extended or refined depending on feedback.

I’d like to further and improve this trial so I’d happily take feedback here or through the feedback form on the site at http://www.solutionexchange.info/feedback.htm.

If  you have any questions then feel free to pop them in a comment below.

Tagged ,

IIS7, Tomcat & Application Request Routing

Further Update: 27nd June 2011

Another update on this topic. If you were making the use of custom error pages in IIS7 and you implemented the below update, you may have noticed that the custom error commands are no longer being adhered to. To change this, you need to set up custom error pages at a site level by choosing your site, selecting “Error Pages”, then “Edit Feature Settings” from action menu and then “Custom error pages”.

Important Update: 22nd June 2011

On page 2 of this article (How To Configure IIS 7.0 and Tomcat with the IIS ARR Module), there is a key step that I failed to observe when I wrote the original post below.  The step in question is the enablement of the (reverse) proxy server after the ARR install.  By doing this, you are able to apply rewrite rules at the site level — something I wasn’t able to achieve originally, which meant that the routing rules within my server farm were somewhat overloaded.

With this setting enabled, I can leave a single delegation rewrite rule at the server farm level, telling IIS to delegate HTTP requests of a certain pattern but leave the rewrite rules that are there for beautification at the desired site level.  This is a much tidier and more scalable approach.

One gotcha that you need to be aware of is that the rewrites at the site level need to be absolute URLs.  Therefore, you could be tempted to place the host of a single tomcat instance that lay behind IIS direct in here and it would work fine but why not allow for a little future proofing and use localhost within all absolute URL site level rewrites, which isolates the rewrites used for masking ugly application URLs and delegates the job of request delegation to the server farm?  This approach would allow for the server farm config to be used to bring other tomcat instances online or taken offline for maintenance etc without having to change the site level configuration.  In other words, it keeps the various areas of the IIS7 interface focused on the job in hand allowing for easier administration.

Please keep this update in mind as you read the otherwise unchanged original post below.

Regards,

Dan

After many years of using the Tomcat Connector (http://tomcat.apache.org/connectors-doc/) when setting up Tomcat behind IIS, it is now time to say goodbye.

This is the conclusion that I’ve come to after having some particularly significant challenges using IIS7 on a 64bit Windows 2008 machine.

The traditional approach I’ve used in the past has been to utilise the Tomcat Connector, which is implemented as an ISAPI Filter, to delegate requests from IIS through to Tomcat.  This has worked great for me in the past and was the subject of a previous article (http://bit.ly/lp6zW) but the 64bit system threw in a couple of additional challenges that weren’t so easy to get around.

The problems faced led me to discover Application Request Routing (ARR), an official extension for IIS7, which allows you to define the delegation of requests to servers sitting behind the IIS instance.

What is particularly nice with this extension is the way in which it facilitates the former approach within the GUI, making it easier to understand what is being delegated.  The approach however, is similar to the ISAPI filter approach – delegating based on URL path patterns.

The following takes you through an overview of how to set this up:

1. Install ARR

You can obtain the appropriate install for the ARR IIS7 extension at http://www.iis.net/download/applicationrequestrouting

Once installed, the ‘Server Farms’ node indicates that it has installed correctly as indicated in the picture below.

ARR Install

The Server Farms node is seen if ARR is installed correctly

A number of  modules are added as part of this extension.  You can find the details of these from the same ARR link (http://www.iis.net/download/applicationrequestrouting)

2. Create Server Farm

Although the concept of a ‘farm’ of servers may be overkill for our needs of delegating HTTP requests through ISS7 to Tomcat, we shall never the less set up a farm containing one server – our Tomcat instance.

To do this:

  1. highlight the ‘Server Farms’ node in the left panel of the IIS7 Management Console .
  2. Choose ‘Create Server Farm’ from the right hand side action menu.
  3. You will be prompted for a name for the farm.  For my  needs in setting up the Open Text Delivery Server behind IIS7, I gave the farm the name ‘Tomcat – Delivery Server’.ARR Server Farm Name
  4. You will then be prompted to set up a server in the farm.  In our case, we are just going to select the localhost instance of Tomcat running on port 8080. To specify the port, open the ‘Advanced settings’.  Strangely, there appears to be no easy way to edit a servers port once set up so make sure you are correct, otherwise you will have to delete and add a new server.

    ARR Add Server

    Make sure you open the Advanced settings to edit the port number

3. Configure the Routing Rules

Now that we have informed IIS7 about the server that sits behind, we need to let it know how we wish to delegate HTTP requests to it.  To do this, we choose the newly created Server Farm in the left hand panel and select the Routing Rules feature.ARR Routing RulesWithin here, we have a few options.  I’ve chosen to keep the defaults of having both checkboxes checked and have no exclusions set as I am delegating this responsibility to the URL Rewrite Rules.

From here, you can add and modify the rewrite rules defining how requests are delegated using the ‘URL Rewite’ link in the right-hand action panel.

In my case, I chose to change the default rule that was set up for me to a regular expression as opposed to the wildcard default.  However, I only chose this due to personal preference.  The pattern I used for this rule is:

cps(.+)

and I ignore the case.

Finally, I have no Conditions or Server Variables to take note of in my scenario although they can easily be added here, so I conclude the rule by setting the action to ‘Route to Server Farm’ and chose my ‘Tomcat – Delivery Server’ farm with a path setting of

/{R:0}

This passes all URL path info through to Tomcat.  I also choose to stop processing of subsequent rules

4. Refine Rules for your Environment

Lastly, in my setup, I’ve added the following further rules to refine how my site is served through IIS7:

Delegate .htm and .html requests:

Pattern - ([^/]+\.html?)
Action path - /cps/rde/xchg/<project>/default.xsl/{R:1}

Delegate .xml requests:

Pattern - ([^/]+\.xml?)
Action path - /cps/rde/xchg/<project>/default.xsl/{R:1}

Delegate default home page

Pattern - ^/?$
Action path - /cps/rde/xchg/<project>/default.xsl/index.htm

Summary

Although this approach of using IIS7 in a reverse proxy capacity may not benefit from the efficiencies of the AJP protocol used by the Tomcat Connector, the impact in most sites will be negligible.  In exchange, you have a way of Tomcat and IIS7 working together in a way where the GUI of the IIS7 Management Console helps admins define and understand what is happening.  The ISAPI Filter approach is often not so visible because of the broad nature of what ISAPI modules can provide but also due to the configuration required outside of the IIS7 Management Console.

As always, if you have any questions, leave a comment.

Tagged , , , , , ,

Progressive Enhancement via AJAX

I have recently been curious on how a normal web site with various posts and page reloads could be improved through AJAXifying it (yes, I did mean to say that – you get what I mean) – i.e. introduce AJAX calls to improve the smoothness of the user experience and minimise page reloads in key areas of a site.

In particular, my understanding of JS frameworks such as jQuery, based on the examples I’d seen, meant that form submissions were tied to specific knowledge of the form that the event was bound to.  For instance, when you bind the onSubmit event to a particular form, the many examples out there show functions that then pull content from the form through something like the following selector:

var inputVal = $('form input[name=user]').val();

This is OK for specific cases like a registration or contact us form that tends to be unique on a page but what if there were multiple similar forms on a given page and you simply wanted to submit all the form data to the same URL as before with the standard form submission but instead through an AJAX HTTP Request?

Somehow, having this sort of specific knowledge from within the handler function didn’t feel right to me, so I set off with the goal to find out how I can get the related form data from the info that is passed to the event function handler by default, without any extra manual passing of data etc.

This led me to the jQuery Event Object (api.jquery.com/category/events/event-object/), which through the target property provides a reference to the DOM element that initiated the event a.k.a. the element that I bound the event to.  This provides the key piece of information that I was missing.

Let’s take the following HTML code snippet that has 3 similar forms on a single page as an example:

<form name="form1" action="/getSomething" method="post">
  <input type="text" name="input1" />
</form>
<form name="form2" action="/getSomethingElse" method="post">
  <input type="text" name="input2" />
</form>
<form name="form3" action="/getAnotherThing" method="post">
  <input type="text" name="input3" />
</form>

Taking the above example, we can bind the event to all three forms in one go with the following:

$('form').submit(getSomethingFunction);

As we know we are passed the event object to the handler function, we can extract the form specifics being used once within the function:

function getSomethingFunction(eventObject)
{
    var formName = eventObject.target.getAttribute('name');
    var formAction = eventObject.target.getAttribute('action');
    ...  
    return false;
}

We can then initiate our post request using the jQuery serialize() function (api.jquery.com/serialize/):

function getSomethingFunction(eventObject)
{
    var formName = eventObject.target.getAttribute('name');
    var formAction = eventObject.target.getAttribute('action');
    $.post(formAction,$('form[name='+formName+']').serialize(), callbackFunction,'json');
    return false;
}

As you can see, through the power of jQuery, it simplifies this type of challenge with only a small amount of easy to follow code, allowing you to re-use the same (pre-AJAXified) server side code.

In my real case, I added a ?format=json to the post URL when calling via AJAX so that my server side PHP script knew that it didn’t need to send a full HTML page back as a response and instead sent a JSON formatted success/failure message back.

From this small investigation, I’m now interested in understanding what frameworks are out there that facilitate this type of progressive enhancement approach and utilise a widely adopted JS library such as jQuery.  Please leave a comment if you have any tips or advice.

Tagged , , , , , ,

Open Text Delivery Server with a Front Controlling Web Server

Overview

This post discusses the best practice of deploying the Open Text Delivery Server in an optimal way alongside a front controlling web server.

Delivery Server is a dynamic web server component that has strengths in coarse grained personalisation and dynamic behaviour as well as system integration.  Therefore, as it is housed within a Servlet Container, it is not the ideal location from which to serve static content (unless you wish to maintain a level of access control over the static content).

Leveraging the use of a front controlling Web Server, facilitates an optimised site deployment as web servers such as Microsoft’s IIS or Apache’ HTTP Server can be utilised for delivering static content in an optimised way.  For example, it is possible to easily configure a far future ‘Expires’ header on a given folder and therefore its content within either Apache or IIS, which promotes the caching of content in a user’s browser, which reduces page load times.  Another example is in the use of mature compression features within such web servers.  Although these examples can be achieved with some Servlet Container’s, it is certainly not straight forward and doesn’t necessarily make sense from an architectural perspective.

It is for this architectural reason, that best-practice dictates that we delegate only the relevant HTTP requests to Delivery Server.  In most cases, this means that Delivery Server is delegated requests for .htm and .xml resources.  The rest can be served from the front controlling web server (or better still a CDN).

This article provides a high-level overview of what to set up.  Depending on feedback, I may post further posts on the details of each step.

Delegating Requests from the Web Server to Delivery Server

This step can be easily achieved using the Tomcat Connector for both IIS and Apache. To find out more see the Tomcat Connector documentation here: http://bit.ly/at1w8G.

This connector uses the Apache JServ Protocol, which connects to port 8009 by default on Tomcat and is optimised to use a single connection between the Web Server and the Delivery Server for many HTTP requests.  Therefore, this represents a better option than using reverse proxy functionality within the Web Server.

If we take a typical Delivery Server install (i.e. the reference install using Tomcat), a page can be accessed with something like the following URL:

http://<host>:8080/cps/rde/xchg/<project>/<xsl_stylesheet>/<resource>

where resource could be any text based file like index.html or action.xml.

The result of correctly installing the Tomcat Connector means that we can access that same resource but through the Web Server on port 80 and not direct to the Tomcat instance on port 8080:

http://<host>/cps/rde/xchg/<project>/<xsl_stylesheet>/<resource>

Many confuse this step with URL rewriting or redirecting as the Tomcat Connector is often called the Jakarta Redirector.  Therefore, I choose to differentiate by saying that this delegates HTTP requests between the two systems and nothing more.

In every install, I have always used the defaults in the workers.properties file and just used the following rule in the uriworkermap.properties file:

/cps/*=wlb

URL Rewriting

Due to the effort of setting up delegation, deciding which HTTP requests should be forwarded to Delivery Server is a simple matter of performing some URL rewrites.

As we have decided to use a mature Web Server, there are best practice ways to achieve this.  In IIS6, HeliconTech (http://bit.ly/bgJEF6) created a very useful ISAPI filter which ports the widely adopted Apache mod_rewrite (http://bit.ly/cfvuLD) functionality.  For both of these, the same rewrite rules can be used.  The following provides a couple of typical examples:

# Default landing page redirect
RewriteRule ^/$ /cps/rde/xchg/<project>/<xsl_stylesheet>/index.htm [L]
# Rewrite to delegate all *.html or *.htm HTTP requests to Delivery Server
RewriteRule ^/?.*/(.+\.html?)$ /cps/rde/xchg/<project>/<xsl_stylesheet>/$1 [L]
# Rewrite to delegate all *.xml HTTP requests to Delivery Server
RewriteRule ^/?.*/(.+\.xml)$ /cps/rde/xchg/<project>/<xsl_stylesheet>/$1 [L]

Those of you who are well versed in regular expressions will see that the last two rules could be combined but I tend to leave them separate to aid readability.

The beauty of using regular expressions in this way is that you can actually create useful SEO benefits to your site also. Take for example the following rule:

RewriteRule ^/?.*/([0-9a-zA-Z_]+)$ /cps/rde/xchg/<project>/<xsl_stylesheet>/$1.htm [L]

This rule maps a URL with many apparent subdirectories to the Delivery Server file.  This means that you can publish a page with a “virtual” path within the Management Server which appears to a browser (and search engines) as something like the following:

http://<host>/this/is/a/descriptive/directory/structure/page.htm

and yet this maps to:

/cps/rde/xchg/<project>/<xsl_stylesheet>/page.htm

IIS7

Being a Microsoft product, IIS7 has some quirks with regards to the rewriting (of course), which I explained in a previous post: http://bit.ly/lp6zW.

Summary

This approach has led to many successful installations where sites could additionally be optimised for SEO and page load.

Tagged , , , , , , , ,

Fun and Games with IIS7 and Tomcat Connector (Jakarta)

I needed to run Tomcat behind IIS and delegate html page requests as well as xml page requests from IIS 7 through to Tomcat.

As I had done this a number of times with IIS6 and Helicon’s rewrite tool, I thought this would be easy… Oh no.  I was wrong.

Therefore, here are a few things to be aware of:

  • Make sure you have installed the Tomcat Connector OK.  I found this site to be useful here for the Tomcat Connector install on IIS 7: http://www.iisadmin.co.uk/?p=72
  • Install the ‘URL Rewrite’ module within IIS 7.  I didn’t do this myself, not because it was difficult… I just simply didn’t do it and would guess there is enough info on how to do this.
  • Configure all rewrite rules at the server level and not at the individual site level.  This is because there is some issue around chaining an HTTP request through an ISAPI filter like the Tomcat Connector and then the URL rewrite module or vice versa.  I even tried another ISAPI based rewriter like the one from Helicon (http://www.helicontech.com) resulting in the same challenge that the rewrites and redirect (delegation – not to be confused with a 301 type redirect) worked independentlybut not together.
  • The syntax for the rewrite rules is different from Apaches mod_rewrite – e.g:
  1. ^/?.*/(.+\.html?)$ would normally handle input urls like /index.htm or /level1/level2/index.htm but this was not liked by the URL Rewrite module in IIS 7.  What worked instead was ([^/]+\.html?).
  2. Remember to use {R:1} syntax for getting the back references instead of $1 etc.
  3. Remember to use conditions if you have multiple sites in the same instance utilising the {HTTP_HOST} and {SERVER_PORT} strings.

Let me know if you found this useful or if there are other useful references to add into this.  I personally found it difficult to dig up something similar.

Tagged , , , , ,