cedricthecat
Active Member
- Joined
- May 17, 2007
- Messages
- 460
Hi folks
I've got to analyse some network usage logs at work. IT have sent this to me as a PDF, which is 812 pages long....
I've managed to get this into an Excel Workbook, but need some way to extract web sites visited. An example of the data is here, but bear in mind that the real file has nearly 30,000 rows!
As you will see, there's a lot of extraneous data ("Site Opened", "Site Closed" etc) so I'm looking for a way to put in a helper column to show just web addresses, which I can then sort and filter, to hopefully end up with just a short list of sites
I was thinking that if there was some way to search a string to find ".com", ".co.uk" and ".net" this should cover most bases, but of course, the length of the site name will vary. Going back to the previous . (dot) could be an option, but some, as illustrated don't have this (eg //192com, which also doesn't have a dot before the com!)
Is this too complex? If anyone has any pointers or suggestions, I would be very chuffed!!
</TD></TR></TBODY></TABLE>
I've got to analyse some network usage logs at work. IT have sent this to me as a PDF, which is 812 pages long....
I've managed to get this into an Excel Workbook, but need some way to extract web sites visited. An example of the data is here, but bear in mind that the real file has nearly 30,000 rows!
<TABLE style="WIDTH: 315pt; BORDER-COLLAPSE: collapse" cellSpacing=0 cellPadding=0 width=420 border=0 x:str><COLGROUP><COL style="WIDTH: 315pt; mso-width-source: Userset; mso-width-alt: 15360" width=420><TBODY><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; WIDTH: 315pt; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" width=420 height=17>http://0.r.msn.com/?</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.r.msn.com/?</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.r.msn.com/?ld=4v2epkitilceglqpl3bpezdlxgrz7yviw5brm</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.r.msn.com/?ld=4vkylnttmkaflm2icgjiosbdmkkaqrr_v3jop</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.tqn.com site closed 149</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.tqn.com site closed 248</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.tqn.com site opened 149</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://0.tqn.com site opened 248</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.bp.blogspot.com site closed 75</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.bp.blogspot.com site opened 75</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.bp.blogspot.com/favicon.ico image/x-icon bin 1</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.css.savemoneyoncars.co.uk site closed 58</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.css.savemoneyoncars.co.uk site opened 58</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.images.savemoneyoncars.co.uk site closed 68</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://1.images.savemoneyoncars.co.uk site opened 68</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://192com.112.2o7.net site closed 0</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://192com.112.2o7.net site opened 0</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://2.bp.blogspot.com site closed 39</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://2.bp.blogspot.com site opened 39</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" height=17>http://24.media.tumblr.com site closed 85
As you will see, there's a lot of extraneous data ("Site Opened", "Site Closed" etc) so I'm looking for a way to put in a helper column to show just web addresses, which I can then sort and filter, to hopefully end up with just a short list of sites
I was thinking that if there was some way to search a string to find ".com", ".co.uk" and ".net" this should cover most bases, but of course, the length of the site name will vary. Going back to the previous . (dot) could be an option, but some, as illustrated don't have this (eg //192com, which also doesn't have a dot before the com!)
Is this too complex? If anyone has any pointers or suggestions, I would be very chuffed!!
</TD></TR></TBODY></TABLE>
Last edited: