Friday, June 19, 2009

Haunted by Headers

Had an interesting problem today tracing the origin of some HTTP headers. An application which uses mod_plsql (non-APEX) was constantly returning some troublesome header information which we needed to remove, particularly when a certain procedure was called. I searched every where in the OAS interface, but none of the settings seemed to be applicable. Eventually I found the culprit was an apache declaration in dads.conf. Why it was there I can only speculate, not having built the application.

On the up side, I can now clarify some more of the behaviour of the undocumented htp.init procedure. During the process of tracing the problem, I attempted to use htp.init to suppress the headers from showing up when calling a particular procedure. Apparently this isn't how it works. The application server still amends header information to the response before it is sent.

Monday, June 15, 2009

Building a Scraper Using UTL_HTTP

Someone set me the task of embedding pages from another one of our online applications into regions of an APEX app for the purpose of maintaining a single navigation and look and feel. My first thought was that it can't be done. As far as I was aware you couldn't perform a GET request from PL/SQL (I'd breifly checked in the past and that was the conclusion I'd arrived at). So instead I quickly threw something together in PHP that scraped the application and then used str_replace to embed additional style/navigation and achieve a pretty good compromise. It was quick and dirty, and getting it to work on the PHP4 version that comes with APEX was a bit of a challenge. It wasn't really an ideal solution, so I kept digging and found the UTL_HTTP package.

It's not quite as simple as calling a function that returns the page content, but it's possible. I followed the documentation as well as an O'Reilly example. The simple package I built doesn't handle proxies or authentication, but it does do a better job at handling large pages as CLOB data. I've pulled out the exception handling to make it simpler to understand. Refer to O'Reilly's example when you are ready for that.

CREATE OR REPLACE PACKAGE WEB_SCRAPER AS
FUNCTION HTTP_GET_PAGE(p_url in VARCHAR2) RETURN CLOB;

FUNCTION DOCUMENT_BODY(p_document in CLOB) RETURN CLOB;

FUNCTION HTTP_GET_PAGE_BODY(p_url in VARCHAR2) RETURN CLOB;

PROCEDURE PRINT_CLOB(p_content in CLOB);
END WEB_SCRAPER;
/

CREATE OR REPLACE PACKAGE BODY WEB_SCRAPER AS
FUNCTION HTTP_GET_PAGE(p_url in VARCHAR2) RETURN CLOB AS
/* Performs a simple HTTP GET request and returns the content */
l_request Utl_Http.req;
l_response Utl_Http.resp;
l_content_buffer VARCHAR2 (32767);
l_page CLOB;
BEGIN
DBMS_LOB.createtemporary(l_page, FALSE);

l_request := Utl_Http.begin_request (url => p_url, method => 'GET');

Utl_Http.set_header (r => l_request, NAME => 'User-Agent', VALUE => 'APEX-Scraper/0.1');
l_response := Utl_Http.get_response (r => l_request);

BEGIN
LOOP
UTL_HTTP.read_text(l_response, l_content_buffer, 32767);
DBMS_LOB.writeappend (l_page, LENGTH(l_content_buffer), l_content_buffer);
END LOOP;
EXCEPTION WHEN Utl_Http.end_of_body THEN
NULL;
END;

Utl_Http.end_response (r => l_response);
RETURN l_page;
END HTTP_GET_PAGE;

FUNCTION DOCUMENT_BODY(p_document in CLOB) RETURN CLOB AS
/* Return everything between the body tags */
BEGIN
RETURN REGEXP_REPLACE(p_document,'^.*<[[:space:]]*BODY[^>]*>(.*)<[[:space:]]*/[[:space:]]*BODY[[:space:]]*>.*$','\1',1,0,'imn');
END DOCUMENT_BODY;

FUNCTION HTTP_GET_PAGE_BODY(p_url in VARCHAR2) RETURN CLOB AS
/* Convienience function combining DOCUMENT_BODY and HTTP_GET_PAGE */
BEGIN
RETURN DOCUMENT_BODY(HTTP_GET_PAGE(p_url));
END HTTP_GET_PAGE_BODY;

PROCEDURE PRINT_CLOB(p_content in CLOB) AS
/* Use this instead of htp.p which only supports Varchar2 */
l_buffer_size Number := 8191; --Can't always use 32767. See bug #5896994
l_clob CLOB := p_content;
l_chunks Number;
l_cloblen Number;
BEGIN
l_cloblen := dbms_lob.getlength(l_clob);
l_chunks := floor(l_cloblen / l_buffer_size) + 1;
FOR i IN 0 .. l_chunks LOOP
htp.p(dbms_lob.substr(l_clob, l_buffer_size, (i * l_buffer_size) + 1));
END LOOP;
END PRINT_CLOB;

END WEB_SCRAPER;
/

Tuesday, June 9, 2009

Eliminating Bugs

Fantastic article...
http://duartes.org/gustavo/blog/post/of-aviation-crashes-and-software-bugs

Gustavo talks about eliminating software bugs by eliminating the possibility to create them.

I think Apex does a pretty good job on this front, as long as you don't start putting "execute immediate" statements in PL/SQL blocks. Not so good at preventing XSS though. Although it does have the "Strip HTML" option set by default in reports, it's over zealous and you often have to switch it off because it can't handle "<" and ">" symbols in legitimate text (really it should just convert them to harmless &lt; and &gt;). Normal display as text items have the "escape special characters" option, but this has the same problem.

In fact, Oracle already has the function required to render user data safely... HTF.escape_sc. It's probably a good idea to wrap every item in the report select statement with this, or create a view that serves this purpose and only use the view when displaying user data. I've toyed with using the function to sanitise user input, but the problem there is each time you save the same form, the escape sequences get longer and longer (ie. & becomes &amp; becomes &amp;amp;).

They should smarten that function up and include a default input sanitisation process for every form, then permit you to downgrade it's paranoia level for exceptional situations. Allowing for exceptional circumstances goes against what the article is saying. But the author doesn't discuss the downsides and tradeoffs of a Nazi development environment. I don't think it's possible to have an environment that completely guards against writing buggy code. It's just another impossible ideal. But there's nothing wrong with making steps towards it.