Flag or Delete a line from Sheet 1 if a wildcard search matches from an Exclude list of words in sheet 2

Sandmanninja

New Member
Joined
Mar 23, 2014
Messages
3
SITUATION:
I am using Excel 2007 on a Windows 7 Professional environment.
I have a list of about 30,000 URLS (from a proxy log)

Desired Results:
I need to (somehow - I don't care how) search through the column of data on sheet 1 and compare it to a list of Exclude Words on sheet 2.
The sheet 2 has about 15-20 words that, if ANY wildcard matches are TRUE, then either the line with the match on sheet 1 is deleted or a certain value (TRUE? KILL?) is entered (automatically) into a cell on that line.


SHEET 1:
URL Browse Time
FXP://corp.icsfl.com 0:03:00
HXXP://9msn.com.au 5:45:00
HXXP://a.rad.msn.com 5:42:00
HXXP://a.scorecardresearch.com 0:48:00
HXXP://ad.afy11.net 0:03:00
HXXP://ad.au.doubleclick.net 0:36:00
HXXP://ad.doubleclick.net 1:57:00
HXXP://ad.turn.com 0:09:00
HXXP://ad.yieldmanager.com 0:12:00
HXXP://ad-ace.doubleclick.net 0:42:00
HXXP://adadvisor.net 0:12:00
HXXP://ad-emea.doubleclick.net 0:15:00
HXXP://adfarm.mediaplex.com 0:03:00
HXXP://ads.rtbidder.net 0:03:00
HXXP://adx.adnxs.com 0:03:00
HXXP://aidps.atdmt.com 0:03:00
HXXP://ajax.aspnetcdn.com 0:03:00
HXXP://ajax.googleapis.com 0:03:00
HXXP://ajax.microsoft.com 0:03:00
HXXP://amch.questionmarket.com 0:03:00
HXXP://api.bing.com 5:42:00
HXXP://api.parsely.com 0:03:00
HXXP://apis.google.com 0:06:00
HXXP://apnstatic.ask.com 0:03:00
(just to list a few of the 30,000 lines)


SHEET 2 (exclude list):
google
doubleclick
mediaplex
bing

So if *google* is within ANY line of the sheet 1, it will either delete the entire line OR have some flag set in a nearby cell.

I don't care if the execution is efficient or not. I have to do a report every month and I'm doing this BY HAND.
(We're monitoring web usage in a school and need to exclude the irrelevant URLs)
Please, please, oh god please help me.

I've played with vlookup, advance filter, and no matter what I try, I cannot get it quite right.
The closest was with a vlookup but it reported far too many false positives and then just skipped what should have been matches.

I thank you for your assistance.
 

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
Welcome to MrExcel

Here is one approach.
Try it on a small set of data first

Code:
Sub main()
    Const DataSheet = "Sheet1"   'Name of sheet with data
    Const ExluSheet = "Sheet2"   'Name of sheet with words to exclude
    Dim wd, c, firstaddress
    Dim lrData: lrData = Sheets(DataSheet).Cells(Sheets(DataSheet).Rows.Count, 1).End(xlUp).Row
    Dim lrExlu: lrExlu = Sheets(ExluSheet).Cells(Sheets(ExluSheet).Rows.Count, 1).End(xlUp).Row
    For Each wd In Worksheets(ExluSheet).Range("A1:A" & lrExlu)
        With Worksheets(DataSheet).Range("a1:a" & lrData)
            Set c = .Find(wd, LookIn:=xlValues, LookAt:=xlPart)
            If Not c Is Nothing Then
                firstaddress = c.Address
                Do
                    Sheets(DataSheet).Cells(c.Row, 2) = "#N/A"
                    Sheets(DataSheet).Cells(c.Row, 3) = wd
                    Set c = .FindNext(c)
                Loop While Not c Is Nothing And c.Address <> firstaddress
            End If
        End With
    Next
End Sub
 
Upvote 0
Maybe just a formula?

A​
B​
C​
D​
1​
Exclude​
2​
google
3​
doubleclick
4​
mediaplex
5​
bing
6​
7​
URL​
Time​
Exclude?​
8​
FXP://corp.icsfl.com
00:03:00​
0​
C8: {=SUM(COUNTIF(A8, "*" & $C$2:$C$5 & "*"))}
9​
HXXP://9msn.com.au
05:45:00​
0​
10​
HXXP://a.rad.msn.com
05:42:00​
0​
11​
HXXP://a.scorecardresearch.com
00:48:00​
0​
12​
HXXP://ad.afy11.net
00:03:00​
0​
13​
HXXP://ad.au.doubleclick.net
00:36:00​
1​
14​
HXXP://ad.doubleclick.net
01:57:00​
1​
15​
HXXP://ad.turn.com
00:09:00​
0​
16​
HXXP://ad.yieldmanager.com
00:12:00​
0​
17​
HXXP://ad-ace.doubleclick.net
00:42:00​
1​
18​
HXXP://adadvisor.net
00:12:00​
0​
19​
HXXP://ad-emea.doubleclick.net
00:15:00​
1​
20​
HXXP://adfarm.mediaplex.com
00:03:00​
1​
21​
HXXP://ads.rtbidder.net
00:03:00​
0​
22​
HXXP://adx.adnxs.com
00:03:00​
0​
23​
HXXP://aidps.atdmt.com
00:03:00​
0​
24​
HXXP://ajax.aspnetcdn.com
00:03:00​
0​
25​
HXXP://ajax.googleapis.com
00:03:00​
1​
26​
HXXP://ajax.microsoft.com
00:03:00​
0​
27​
HXXP://amch.questionmarket.com
00:03:00​
0​
28​
HXXP://amch.questionmarket.com
00:03:00​
0​
29​
HXXP://api.parsely.com
00:03:00​
0​
30​
HXXP://apis.google.com
00:06:00​
1​
31​
HXXP://apnstatic.ask.com
00:03:00​
0​
 
Upvote 0
The subroutine worked - thank you.
I would prefer to make it an inline function and was thinking about a wild card with a count (as I'm more comfortable with them).
Having said that, the subroutine runs really fast (had 40,000+ this month) and I spent about an hour adding words to the exclude list, running the sub, sorting, and seeing what was left over.
Thanks again!
 
Upvote 0

Forum statistics

Threads
1,217,413
Messages
6,136,474
Members
450,015
Latest member
excel_beta_345User

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top