Delete rows that contains certain text - fastest way

saulodefaria

New Member
Joined
Jul 8, 2014
Messages
3
Hi guys!

Here's my problem: I have some huge txt files (~700k lines) with csv data from stock prices. One of the columns is the name of the company. So I want to get only the lines of certain companies. I actually already have a solution, but it is taking too much time. I think it may take hours to filter just one txt file, but I have lots of them...

Here's the code I'm using now:

Code:
Sub delete()


Dim dontDelete
dontDelete = Array("APPLE", "MICROSOFT", "GOOGLE")


Dim i As Long, j As Long


Dim isThere As Boolean


For i = Range("A" & Rows.Count).End(xlUp).Row To 1 Step -1
    For j = LBound(dontDelete) To UBound(dontDelete)
        If InStr(1, Range("A" & i), dontDelete(j)) Then
            isThere = True
        End If
    Next j
    If Not isThere Then
        Range("A" & i).delete shift:=xlUp
    End If
    isThere = False
Next i


End Sub

It does the job. But I was hoping I could get some ideas to do it faster.

Thanks!
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
Try this...

Code:
[color=darkblue]Sub[/color] delete()
    
    [color=darkblue]Dim[/color] LastCol [color=darkblue]As[/color] [color=darkblue]Long[/color], i [color=darkblue]As[/color] Long
    [color=darkblue]Dim[/color] dontDelete [color=darkblue]As[/color] [color=darkblue]Variant[/color], v [color=darkblue]As[/color] [color=darkblue]Variant[/color]
    
    dontDelete = Array("APPLE", "MICROSOFT", "GOOGLE")
    v = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value
    LastCol = Cells.Find("*", , , , 2, 2).Column + 1
    
    [color=darkblue]For[/color] i = 1 [color=darkblue]To[/color] [color=darkblue]UBound[/color](v, 1)
        [color=darkblue]If[/color] IsNumeric(Application.Match(v(i, 1), dontDelete, 0)) [color=darkblue]Then[/color]
            v(i, 1) = i
        [color=darkblue]Else[/color]
            v(i, 1) = ""
        [color=darkblue]End[/color] [color=darkblue]If[/color]
    [color=darkblue]Next[/color] i
    
    Application.ScreenUpdating = [color=darkblue]False[/color]
    [color=darkblue]With[/color] Cells(1, LastCol).Resize(UBound(v, 1))
        .Value = v
        .EntireRow.Sort Cells(1, LastCol), xlDescending, Header:=xlNo
        .SpecialCells(xlCellTypeBlanks).EntireRow.ClearContents
        .ClearContents
    [color=darkblue]End[/color] [color=darkblue]With[/color]
    Application.ScreenUpdating = [color=darkblue]True[/color]
    
[color=darkblue]End[/color] [color=darkblue]Sub[/color]
 
Upvote 0
This assumes all of the cells with the company name you are looking for are in column A. You're basically doing a find for the company name and replacing it with a blank cell. Then Goto Special, Select Blanks, and delete all of the sheet rows in column A that contain blank cells. Then you can tweak it for ease of use. Depending on the version of excel you have and the amount of memory on your machine this may or may not work. I've had sheets with 60,000 rows on an XP machine with 3 GB ram and Excel 2007 choke on this but a windows 7 machine with 8 GB work just fine. Have no idea what 700,000 rows will do though.

Code:
Columns("A:A").Select
    Selection.Replace What:="CompanyName", Replacement:="", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Selection.SpecialCells(xlCellTypeBlanks).Select
    Selection.EntireRow.Delete

Sorry, I reread your post and this is not what your looking for.
 
Last edited:
Upvote 0
saulodefaria,

Welcome to the MrExcel forum.

1. What version of Excel and Windows are you using?

2. Are you using a PC or a Mac?

If I understand you correctly.

I assume that your text file data only uses column A.

Sample raw data in column A (not all 756,000 rows are shown):


Excel 2007
A
1Apple1
2Microsoft2
3Google3
4other4
5another5
6one more6
7Apple7
8Microsoft8
9Google9
10other10
11another11
12one more12
13Apple13
14Microsoft14
15Google15
16other16
17another17
18one more18
19Apple19
20Microsoft20
21Google21
22other22
23another23
24one more24
25Apple25
26Microsoft26
27Google27
Sheet1


After the macro using two arrays in memory (with a macro run time of 9.594 seconds):


Excel 2007
A
1Apple1
2Microsoft2
3Google3
4Apple7
5Microsoft8
6Google9
7Apple13
8Microsoft14
9Google15
10Apple19
11Microsoft20
12Google21
13Apple25
14Microsoft26
15Google27
16Apple31
17Microsoft32
18Google33
19Apple37
20Microsoft38
21Google39
22Apple43
23Microsoft44
24Google45
25Apple49
26Microsoft50
27Google51
Sheet1


Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

1. Copy the below code
2. Open your NEW workbook
3. Press the keys ALT + F11 to open the Visual Basic Editor
4. Press the keys ALT + I to activate the Insert menu
5. Press M to insert a Standard Module
6. Where the cursor is flashing, paste the code
7. Press the keys ALT + Q to exit the Editor, and return to Excel
8. To run the macro from Excel press ALT + F8 to display the Run Macro Dialog. Double Click the macro's name to Run it.

Code:
Option Explicit
Option Compare Text
Sub KeepAppleMicrosoftGoogle()
' hiker95, 07/09/2014, ME790233
Dim oa As Variant, a As Variant
Dim i As Long, j As Long, lr As Long
Application.ScreenUpdating = False
lr = Cells(Rows.Count, 1).End(xlUp).Row
Set oa = Range("A1:A" & lr)
ReDim a(1 To lr, 1 To 1)
For i = 1 To lr
  If InStr(oa(i, 1), "apple") Or InStr(oa(i, 1), "microsoft") Or InStr(oa(i, 1), "google") Then
    j = j + 1
    a(j, 1) = oa(i, 1)
  End If
Next i
Range("A1:A" & lr) = a
Application.ScreenUpdating = True
End Sub

Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the KeepAppleMicrosoftGoogle macro.
 
Last edited:
Upvote 0
I was almost giving up thinking "700k lines, it's imposible to do that fast..." but then this! haha thanks guys!

AlphaFrog, your code ran very fast but I don't know why it deleted all rows :/

hiker95, your code ran perfectly. It does the job in less than 10 seconds!

Thanks a lot!
 
Upvote 0
saulodefaria,

Sample raw data with more than one column:


Excel 2007
AB
1Apple11
2Microsoft22
3Google33
4other44
5another55
6one more66
7Apple77
8Microsoft88
9Google99
10other1010
11another1111
12one more1212
13Apple1313
14Microsoft1414
15Google1515
16other1616
17another1717
18one more1818
19Apple1919
20Microsoft2020
21Google2121
22other2222
23another2323
24one more2424
25Apple2525
26Microsoft2626
27Google2727
Sheet1


After the macro:


Excel 2007
AB
1Apple11
2Microsoft22
3Google33
4Apple77
5Microsoft88
6Google99
7Apple1313
8Microsoft1414
9Google1515
10Apple1919
11Microsoft2020
12Google2121
13Apple2525
14Microsoft2626
15Google2727
16Apple3131
17Microsoft3232
18Google3333
19Apple3737
20Microsoft3838
21Google3939
22Apple4343
23Microsoft4444
24Google4545
25Apple4949
26Microsoft5050
27Google5151
Sheet1


Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

Code:
Option Explicit
Option Compare Text
Sub KeepAppleMicrosoftGoogleV2()
' hiker95, 07/09/2014, ME790233
Dim oa As Variant, a As Variant
Dim i As Long, j As Long, lr As Long, lc As Long, c As Long
Application.ScreenUpdating = False
lr = Cells(Rows.Count, 1).End(xlUp).Row
lc = Cells.Find("*", , xlValues, xlWhole, xlByColumns, xlPrevious, False).Column
Set oa = Range(Cells(1, 1), Cells(lr, lc))
ReDim a(1 To lr, 1 To lc)
For i = 1 To lr
  If InStr(oa(i, 1), "apple") Or InStr(oa(i, 1), "microsoft") Or InStr(oa(i, 1), "google") Then
    j = j + 1
    For c = 1 To lc
      a(j, c) = oa(i, c)
    Next c
  End If
Next i
Range(Cells(1, 1), Cells(lr, lc)) = a
Application.ScreenUpdating = True
End Sub

Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the KeepAppleMicrosoftGoogleV2 macro.
 
Upvote 0
hiker95,

Just one more thing: actually there are like 20 companies that I want to keep (not only 3), so is there a smarter way to put all the company names as constraint (maybe in an array) instead of doing ---InStr(oa(i, 1), "companyX") Or--- to all companies?

Thanks in advance!
 
Upvote 0
I was almost giving up thinking "700k lines, it's imposible to do that fast..." but then this! haha thanks guys!

AlphaFrog, your code ran very fast but I don't know why it deleted all rows :/

hiker95, your code ran perfectly. It does the job in less than 10 seconds!

Thanks a lot!

I was looking for whole matches for what it's worth.
Hiker95 does good code and it appears to work for you.
 
Upvote 0
saulodefaria,

hiker95, your code ran perfectly. It does the job in less than 10 seconds!

Thanks for the feedback.

You are very welcome. Glad I could help.

And, come back anytime.
 
Upvote 0

Forum statistics

Threads
1,214,805
Messages
6,121,656
Members
449,045
Latest member
Marcus05

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top