Transforming irregular paragraph-style data into Excel

callumv

New Member
Joined
Aug 14, 2016
Messages
4
I've been tinkering with this problem for a while, but haven't been able to make much headway; I'm sure there's probably a simple solution I'm missing.

I have a .txt dataset of the form;

ID number
Name
Address1
City
Tel

ID number
Name
Address1
Address2
City

that I've been trying to transform into an excel document of the form

ID number | Name | Address1 | Address2 | City | Tel
ID number | Name | Address1 | Address2 | City | Tel

with the slight problem that the data is irregular (ie some individuals have tel no, some have left out and so on, so paragraphs not consistent number of lines.

I've tried instructing the paragraph break as a delimiter, inserting line end commas in Notepad++ and so on, but that doesn't seem to be recognised; I don't suppose anyone has an easier way of doing this? There are several hundred people in the dataset so manual transformation isn't practical.
 

Excel Facts

Add Bullets to Range
Select range. Press Ctrl+1. On Number tab, choose Custom. Type Alt+7 then space then @ sign (using 7 on numeric keypad)
Do the paragraphs have headers?

e.g.

ID 1234
Name Bob
Address1 335 Main St.

ID 1235
Name Sally
Phone (374) 555-1212
City Newark NJ


If not are the data always in the same order
i.e. ID always before Name (if it exists) before Address1 (if it exists) etc.
Is the ID always present? Is it unique to each record? Does it have a particular format?

Could you post a short sample of possible data.
 
Last edited:
Upvote 0
Hi Mike, thanks for the quick response.

Each paragraph does have a header (a unique database ID number of 5-6 digits), and the entries are always in the same order (so Address follows Name follows unique ID# even when some data is not present). This is basically an unusually-shaped .txt report from a database that was accidentally deleted by a colleague and that I'm now having to recover.

A Jane Doe-d sample (ie not genuine data);

11212,
Ruth Jackson
15 Egerton Rd N
Dorking
DO16 0DB
123 1234 1234




69465
Sain Ahmad
122 Jackson Rd
Rochester
RX21 0NR
123 1234 1234

Many thanks,
Callum
 
Upvote 0
When I said paragraph headers, I meant that each line began with the title of the information that it held. Your example shows that that is not the case.

In your example I notice that the first ID is followed by a , and the second is not.
Are the unique ID's the only data field that is a six digit number?
Is there always a blank line between different records? (Are there any blank lines within a record?)
 
Upvote 0
In your example I notice that the first ID is followed by a , and the second is not.
Are the unique ID's the only data field that is a six digit number?
Is there always a blank line between different records? (Are there any blank lines within a record?)
In addition to the question that Mike has asked above, I would like to know what fields in your records are never missing. For example, will there always be a name? Will there always be at least one address line? Will there always be a city? Will there always be a postal code?
 
Upvote 0
Ah, apologies for the confusion re record headers.

The additional ',' is a typo, the dataset does not end any lines with ',' at present though I can alter this in Notepad++ if needed.

The unique ID number varies between five and six digits (presumably owing to some transaction involved in its creation removing an initial 0 from some records). Although there are phone numbers in the dataset (so other solely numeric fields), the only numeric fields of 5 or 6 digits length are the unique ID numbers.

There is a single blank line separating each record, and no blank lines within any record (missing fields are simply omitted).

In answer to Rick's question;

Each record has a unique ID number, name and postal code (eg RX21 0NR), other fields were apparently optional during the creation of the data pool several months ago and so some records do not include them.
 
Upvote 0
In answer to Rick's question;

Each record has a unique ID number, name and postal code (eg RX21 0NR), other fields were apparently optional during the creation of the data pool several months ago and so some records do not include them.
Are all your postal codes 4 characters followed by a space followed by 3 characters?

What would a two-line address look like? What I am trying to do is figure out when the text above the postal code is a city name rather than the 2nd line of an address.
 
Upvote 0
Are all your postal codes 4 characters followed by a space followed by 3 characters?


In Britain, postal codes are always a 6/7-length alphanumeric string, and can be of the form


ABC DEF
or
ABCD EFG

What would a two-line address look like? What I am trying to do is figure out when the text above the postal code is a city name rather than the 2nd line of an address.


A 2-line address (ie fields Address 1 + Address 2) could look like;

12 Street Address
Suburb
City

and so Address 2 is often vestigial, just adding geographical context.


Is it possible to import data into a column and request that excel treat each empty line as a signal to switch to a new row? Could I then separate each of those cells into their rows horizontally using what were previously line-end commas as delimiters?


ie

(having used NP++ to add line-end commas)

11212,
Ruth Jackson,
15 Egerton Rd N,
Dorking,
DO16 0DB,
123 1234 1234,

69465,
Sain Ahmad,
122 Jackson Rd,
Rochester,
RX21 0NR,
123 1234 1234,

into

11212, Ruth Jackson, 15 Egerton Rd N, Dorking, DO16 0DB, 123 1234 1234
69465, Sain Ahmad, 122 Jackson Rd, Rochester, RX21 0NR, 123 1234 1234

into

11212 | Ruth Jackson | 15 Egerton Rd N | Dorking | DO16 0DB | 123 1234 1234
69465 | Sain Ahmad | 122 Jackson Rd | Rochester | RX21 0NR | 123 1234 1234
 
Upvote 0

Forum statistics

Threads
1,214,806
Messages
6,121,667
Members
449,045
Latest member
Marcus05

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top