VBA - Importing select data from .txt to excel by specific value in a row

jamesearth

New Member
Joined
Jul 14, 2010
Messages
3
Hey everyone,

Just registered for the forum - seems like an incredible community resource!

Anyhow, I did a little searching and found a few threads on VBA imports from text files to excel, but I had a slightly more specific request.

I have a text file that is very large (almost 1GB - over 3m rows), and I want to extract less than 50k rows from it.

The rows I want can be identified by a value in column 4 - 'CreationDate', if 2010-05-01 =< CreationDate =< 2010-06-01 , I want to extract that row to an excel file.

The VBA file doesn't necessarily need to run quickly - I can leave it running on my computer overnight, etc.

Are there easy-to-implement frameworks out there that would do this efficiently? I'm on a Mac right now, but could probably move to a PC, too.

Thanks!
 

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.
It probably wouldn't be too hard to come up with something. Are you able to publish a sample dataset?

EDIT:PS if you're using a pc platform, it might be possible to use sql, otherwise you'd probably have to read the file one line at a time and test the 4th data item
 
Last edited:
Upvote 0
I've knocked this together, after a fair bit of surfing:

Code:
Sub Read_Text_File()
    Dim fPath, fName, field, uLimit, lLimit, destSht
    Application.ScreenUpdating = False
    Set rs = CreateObject("ADODB.recordset")
    Set conn = CreateObject("ADODB.Connection")
    fPath = "C:\myFolder\" 'file path of text file
    fName = "testdata.csv" 'file name of text file
    field = "creation date"
    uLimit = "01 may 2010"
    lLimit = "01 jun 2010"
    Set destSht = Sheets("import")
    conn.Open "DRIVER={Microsoft Text Driver (*.txt; *.csv)};DBQ=" & fPath & ";", "", ""
    rs.Open "select * from " & fName & " where ([" & field & "]>#" & uLimit & "# and [" & field & "]<#" & lLimit & "#);", conn, 3, 1, 1
    With rs
        For c = 0 To .fields.Count - 1
            destSht.Cells(1, c + 1) = .fields(c).Name
        Next c
        .movefirst
        r = 2
        Do While Not .EOF = True
            For c = 0 To .fields.Count - 1
                destSht.Cells(r, c + 1) = .fields(c)
            Next c
            r = r + 1
            .movenext
        Loop
    End With
    Set rs = Nothing
    Set conn = Nothing
End Sub
Might give odd results if your data doesn't have a header tho.
 
Upvote 0
PS I don't know how it'll handle 3m records either - my sample only had 250!
 
Upvote 0
licenseStateName licenseProductName licenseKey licenseCreationDate licenseExpirationDate userFirstName userLastName userEmail marketingInfoState marketingInfoCountry marketingInfoIndustry postalCode accountId accountName licenseGroupId licenseGroupName licenseId licenseSeatsNumber licenseSessionCount lastViewerVersion licenseNewsletterOptIn licensePromotionOptIn numAccountLicenses ContactName ContactTitle ContactEmail ContactPhone
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 neha lastname name@name.com null null null null 9411884 1 13 13 4 9411690 Software Online Order 9425002 2 null null X X 1 None None null null null
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 Michael lastname name@name.com null null null null 9411838 2 13 13 4 9411643 Software Online Order 9424956 2 2 PR 1 X X 1 None None null null null
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 Craig lastname name@name.com null null null null 9411819 3 13 13 4 9411624 Software Online Order 9424937 2 7 PR 2 X X 1 None None null null null
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 David lastname name@name.com null null null null 9412025 4 13 13 4 9411830 Software Online Order 9425155 2 14 PR 3 X X 1 None None null null null
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 Yandong lastname name@name.com null null null null 9411676 5 13 13 4 9411480 Software Online Order 9424790 2 null null X X 1 None None null null null
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 null null name@name.com null null null null 9412116 6 13 13 4 9411924 Software Online Order 9425256 2 null null X X 2 None None null null null
customer kh XXXXXXXXXXXX 6/30/10 6/30/11 null null name@name.com null null null null 9412116 7 13 13 4 9411924 Software Online Order 9425255 2 null null X X 2 None None null null null

That is the exact format - in small quantities like this, it imports to excel with the wizard using delimited spacing. The date by which we want to identify is right after the "XXXXXXXXXXXX" - in these cases, almost all 6/30/10. Note in the txt file, the date format is 2011-06-30.

Thanks for your help!!!
 
Upvote 0
Okay, just to be clear, there IS a header row, but the data is SPACE delimited.

The previous code, I believe, relies on the data being comma delimited. I'll have a look later to see if there's a way of setting the delimiter.

HTH
 
Upvote 0
Right, I think this ought to do the job. Obviously you're going to need to put your own file path in there and it may not also be desirable to dump the data to the current worksheet, but it should get you started.

As it turns out, I believe the date format is a definite advantage in this scenario.

Code:
Sub importData()
    Application.ScreenUpdating = False
    Dim InputData, rec, l, creationDate
    Open "c:\local\testdata.txt" For Input As #1    ' Open file for input.
    l = 1
    Do While Not EOF(1)    ' Check for end of file.
    Line Input #1, InputData    ' Read line of data.
        rec = Split(InputData, " ")
        If l = 1 Then 'header row
            l = outputData(rec, l)
        Else
            creationDate = rec(3)
            If creationDate >= "2010-05-01" And creationDate <= "2010-06-01" Then
                l = outputData(rec, l)
            End If
        End If
    Loop
    Close #1
End Sub
 
 
Function outputData(records, lineNo)
    Dim i
    For i = 0 To UBound(records)
        ActiveSheet.Cells(lineNo, i + 1) = records(i)
    Next
    outputData = lineNo + 1
End Function
HTH
 
Last edited:
Upvote 0
ok here is something I use to extract data from a CSV file you will need to add to the schema routine to define ALL of your fields i just use simple F1, F2 etc and define the type, change the delimiter bit to a space, the schema file needs to be in the same folder as the data

finally change the SQL select to select by dates, you may need a # around the dates

'****
' Declare some global variables
'****

Public FileName As String
Public StrPathToTextFile As String

Sub ParseDataFile()

'****
' Setup constants for the SQL query
'****

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1

'****
' Declare stuff
'****

Dim StrSQL As String

'****
' create objects that we need
'****

Set objConnection = CreateObject("ADODB.Connection")
Set ObjRecordSet = CreateObject("ADODB.Recordset")

StrPathToTextFile = "C:\mydata\"
FileName = "file1.csv"

Call CreateSchema

'****
' All preparatory work done, lets begin the main process
'****

'****
' Open JET connection to our data file
'****

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & StrPathToTextFile & ";" & _
"Extended Properties=""text;HDR=YES;FMT=Delimited"""

'****
' Build SQL strings to get unique categories in no order
'****

StrSQL = "SELECT * FROM " & StrFile
StrSQL = StrSQL & " WHERE F5 <> 0"

'****
' Execute the SQL and get the results and store for next stage
'****

ObjRecordSet.Open StrSQL, objConnection, adOpenStatic, adLockOptimistic, adCmdText

Range("A1").CopyFromRecordset ObjRecordSet
Columns("A:F").EntireColumn.AutoFit

ObjRecordSet.Close

End Sub
Sub CreateSchema()

Open StrPathToTextFile & "schema.ini" For Output As #1

Print #1, "[" & StrFile & "]"
Print #1, "Format=Delimited(,)"
Print #1, "ColNameHeader = False"
Print #1, "MaxScanRows=0"
Print #1, "Col1=F1 Integer"
Print #1, "Col2=F2 Integer"
Print #1, "Col3=F3 Integer"
Print #1, "Col4=F4 Integer"
Print #1, "Col5=F5 Integer"
Print #1, "Col6=F6 Text"

Close #1

End Sub
 
Upvote 0
Hi,

My requirement is also similar, could I get some help:

The rows I want can be identified by a value in column named - "PRODUCT_CODE", if it contains the any one of the following codes 4007, 4008, 4009, 4011/C, 4011/E, 4020, 4021, 4022, 4023, 4024, 4025, 4066, 4067, 4070, 4074, 4075, 4076, 4078, 4092
 
Upvote 0

Forum statistics

Threads
1,214,824
Messages
6,121,783
Members
449,049
Latest member
greyangel23

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top