taintedSoul
New Member
- Joined
- Mar 17, 2011
- Messages
- 1
I am attempting to build an automated spider to compare two versions of the 'same' web page by extracting the HTML Source code of each URL (old & new). There should be no visual differences, but there may be some differences between hidden meta data and URLs, so I don't want to do a 1:1 comparison, but would like to somehow identify either the number of differences, or somehow skip the known differences (server names, domains, other minor changes, etc).
I have the code to store the HTML into string objects that are ready to compare:
I would like to compare the differences between strPage1 and strPage2 but keep comparing after encountering the first difference and output the # of discrepancies to column (E) on my worksheet. Any ideas?
Example of strPage1:
Example of strPage2:
I would want this to either return "4" for 4 differences, or somehow highlight the various differences, or even a % of how different the two strings are from each other.
I wasn't able to find anything similar to this by simply googling, but if there is something already out there I apologize.
Thanks in advance for any advice! Other suggestions for automating this same type of thing are definitely welcome.
I have the code to store the HTML into string objects that are ready to compare:
Code:
strPage1 = GetPageHTML(strFullURL, intTimeout)
strPage 2 = GetPageHTML(strFullURL, intTimeout)
I would like to compare the differences between strPage1 and strPage2 but keep comparing after encountering the first difference and output the # of discrepancies to column (E) on my worksheet. Any ideas?
Example of strPage1:
Code:
<html>
<head>
< meta name="server" content="server01">
< meta name="domain" content="google.com">
< meta type="text/javascript" src="http://google.com/file1.js"></script>
</head>
<body>
<p><a href="http://google.com/searchResults/test.html">Test URL</a></p>
</body>
</html>
Example of strPage2:
Code:
<html>
<head>
< meta name="server" content="server02">
< meta name="domain" content="bing.com">
< meta type="text/javascript" src="http://bing.com/file1.js"></script>
</head>
<body>
<p><a href="http://bing.com/searchResults/test.html">Test URL</a></p>
</body>
</html>
I would want this to either return "4" for 4 differences, or somehow highlight the various differences, or even a % of how different the two strings are from each other.
I wasn't able to find anything similar to this by simply googling, but if there is something already out there I apologize.
Thanks in advance for any advice! Other suggestions for automating this same type of thing are definitely welcome.