Extract data from multiple text files to Excel

joand

Active Member
Joined
Sep 18, 2003
Messages
267
In E:\Test folder, I have multiple INPUT text files. Each text file contains multiple datasets. Each dataset is a record that begins with the ID tag and ends with the ENDID tag. I want to extract the VALUES of each MAIN TAG in the file and put them into an EXCEL sheet for easier viewing. The SHEET will have a column each for the MAIN tags and the values for each tag in the succeeding rows.

HTML:
<ID> TAG VALUE:
Value between <ID>[space] and [NEXT LINE]<AGE>

<AGE> TAG VALUE:
Value between <AGE>[space] and [NEXT LINE]<PUBDATE>

<PUBDATE> TAG VALUE:
Value between <PUBDATE>[space] and [NEXT LINE]<WRITER>

<WRITER> TAG VALUE:
Value between <WRITER>[space] and [NEXT LINE]<LANGUAGE>

<LANGUAGE> TAG VALUE
Value between <LANGUAGE>[space] and [NEXT LINE]<PAGE>

<PAGE> TAG VALUE:
Value between <PAGE>[space] and [NEXT LINE]<DRAWINGS>

<DRAWINGS> TAG VALUE:
Value between <DRAWINGS>[space] and [NEXT LINE]<TITLE>

<TITLE> TAG VALUE:
Value between <TITLE>[space] and [NEXT LINE]<INVENTION>

<INVENTION> TAG VALUE:
Value between <INVENTION>[space] and [NEXT LINE]<DESCRIPTION>

<DESCRIPTION> TAG VALUE:
Value between <DESCRIPTION>[space] and [NEXT LINE]<USES>

<USES> TAG VALUE:
Value between <USES>[space] and [NEXT LINE]<ADDITIONAL>

<ADDITIONAL> TAG VALUE:
Value between <ADDITIONAL>[space] and [NEXT LINE]<FIGURE>

<FIGURE> TAG VALUE:
Value between <FIGURE>[space] and [NEXT LINE]<ENDID>

To illustrate better, please find below a sample of TWO datasets for illustration. The datasets are separated from each other by a SPACE (like the one below)

HTML:
<ID> 1132234-122X3
<AGE> 23
<PUBDATE> 20101022
<WRITER> 6996
<LANGUAGE> GER
<PAGE> 31
<DRAWINGS> 1/5
<TITLE> Morphometric device for determining necessity of food ingestion
<INVENTION> This device includes <BOLD>housing</BOLD>, <BOLD>transparent tubes</BOLD>, <BOLD>light source</BOLD>, <BOLD>diffusing screen</BOLD>, photodetectors, video camera, interface connected to computer for processing information, and software specific of data processing.
<DESCRIPTION> A method of designing feeding strategy for marine and continental fish species.
<USES> For determining necessity of food ingestion of fish species.
<ADDITIONAL> The device allows designing feeding strategy for fish species by evaluating nutritional state and growth potential.
<FIGURE> The drawing is a side view of morphometric device.
<ENDID> 1132234-122X3

<ID> 1222235-122X1
<AGE> 23
<PUBDATE> 20101022
<WRITER> 6996
<LANGUAGE> GER
<PAGE> 31
<DRAWINGS> 1/5
<TITLE> Thermal solar collector for absorbing heat from atmosphere
<INVENTION> A thermal solar collector comprises a coil that circulates a <BOLD>water/glycol mixture</BOLD>, and a <BOLD>thermodynamic panel<BOLD> coupled to solar collector via separators, <BOLD>polyurethane panel or insulator</BOLD>. Two circuits are provided to optimize the solar energy collecting panel.
<DESCRIPTION> A thermal solar collector for absorbing heat from the atmosphere for air conditioning, and for producing hot water for swimming pools.
<USES> A thermal solar collector for absorbing heat from the atmosphere
<ADDITIONAL> The thermal solar collector has a great capacity to absorb solar heat, saves cost, and requires only little space for installation.
<FIGURE> The drawings are perspective views of the thermal solar collector.
<ENDID> 1222235-122X1

The OUTPUT Excel Sheet would contain the following columns:

HTML:
Column A: All <ID> values
Column B: All <AGE> values
Column C: All <PUBDATE> values
Column D: All <WRITER> values
Column E: All <LANGUAGE> values
Column F: All <PAGE> values
Column G: All <DRAWINGS> values
Column H: All <TITLE> values
Column I: All <INVENTION> values
Column J: All <DESCRIPTION> values
Column K: All <USES> values
Column L: All <ADDITIONAL> values
Column M: All <FIGURE> values

The values for these columns would be the values for these specific tags in the document. Each ID tag is UNIQUE for each dataset. The ENDID has the same value as the ID tag and has no specific use except that it indicates the end of the dataset.

Can anyone help get started with this? I have little knowledge with VBA. Any help would be greatly appreciated.
 

Excel Facts

Bring active cell back into view
Start at A1 and select to A9999 while writing a formula, you can't see A1 anymore. Press Ctrl+Backspace to bring active cell into view.

Forum statistics

Threads
1,224,545
Messages
6,179,432
Members
452,915
Latest member
hannnahheileen

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top