[VBA] Find Values In HTML Tag Attributes From HTML String

access_time Thursday, March 6, 2014 5:36 pm
account_circle Andy Boot

I wrote this function as part of a Outlook email cleansing VBA script for Outlook 2010. The idea was that the main script scanned through all emails in each folder and decided whether the contents were deemed confidential, if it was flagged then the email would have been moved to a separate folder. The main purpose behind the script was to detect a regular expression of text or screenshot (a screenshot may contain customer information!) and move it.

However, we discovered that email signatures were detected as “screenshots”, so we needed a method to exclude signatures from detection.

The following function will scan a HTML string for an image (<IMG>) tag where the source (src) contains an image file name. In Outlook, files are named image001.jpg, image002.jpg, etc. So it is quite difficult to filter through these, so make sure your code is able to scan though all attachments and output the file name.

Just for reference, a signature generated by Outlook looks like this:

<img border=0 width=239 height=57 id=”Picture_x0020_1″ src=”cid:image001.jpg@01CF395C.6650A790″ alt=”Description: Description: company_logo_cmyk min size”>

Notice that our alt attribute contains a description which is unique to this image file, so we have at least something to work with. Should your images not contain a description, you might be able to get away with excluding attachments / images based on their file size (bytes) and image dimensions (based on the width & height attributes).

Private Function FindAltKeywordsInImgs(body, fileName) As Boolean

Dim resultClasses As Object ‘ MSHTML.IHTMLElementCollection
Dim resultClass As Object ‘ MSHTML.IHTMLElement
Dim allResultsDiv As Object ‘ MSHTML.IHTMLElementCollection
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
html.body.innerHTML = body
Set resultClasses = html.getElementsByTagName(“img”)

Dim alt As String
Dim src As String
alt = “”
src = “”

For Each resultClass In resultClasses
alt = resultClass.getAttribute(“alt”)
src = resultClass.getAttribute(“src”)

If InStr(1, src, fileName) <> 0 Then
If (InStr(1, alt, “logo”) <> 0 Or InStr(1, alt, “companyname”) <> 0) Then
FindAltKeywordsInImgs = False
Else
FindAltKeywordsInImgs = True
End If
End If
Next
resultClass

End Function