Cleansing Data to be uploaded into Web Database

kaylee1982

New Member
Joined
Jul 19, 2012
Messages
1
Hello - I only have very basic excel skills so would appreciate your help.

I run a database which is used by Universities to order items of clothing for their academic staff at graduation ceremonies.

In my database I have over 400 Universities and all of these have their own set of awards/degrees i.e. Bachelors, Masters, PhDs etc.

Universty administrators have th abiltiy to upload spreadsheets with this data into a web database to make their ordres. They could have a spreadhseet with anywhere between 10 and 2000 rows. They match the fields on the web to the columns in the spreadsheet and upload. Sounds simple.

The spreadsheets typically have the following format:

UsernameTitleFirst NameLast NameDepartmentInstitution at which you studiedDegreeYear of DegreeHeight
staff1@staff.ac.ukMr.JohnBrownEnglishBirmingham UniversityBA19856' 0"
staff1@staff.ac.ukMs.CharlotteGreenScienceUniversity of ExeterBSc19755' 4"
staff1@staff.ac.ukProf.JennyBlackLawYaleLLM19995' 2"
staff1@staff.ac.ukDr.BobWhiteLanguagesQueen Mary LondonPhD20115' 10"
staff1@staff.ac.ukMr.HarryRedEnglishCambridge UniversityMA20055'11"
staff1@staff.ac.ukDr.JamesGreyScienceUniversity of DurhamPDPhil20036' 2"
staff1@staff.ac.ukMrs.DebbyBlueLawOxfordLLM19845' 7"

<colgroup><col style="width: 91pt; mso-width-source: userset; mso-width-alt: 4425;" width="121"> <col style="width: 36pt; mso-width-source: userset; mso-width-alt: 1755;" width="48"> <col style="width: 56pt; mso-width-source: userset; mso-width-alt: 2706;" width="74"> <col style="width: 53pt; mso-width-source: userset; mso-width-alt: 2596;" width="71"> <col style="width: 62pt; mso-width-source: userset; mso-width-alt: 2998;" width="82"> <col style="width: 156pt; mso-width-source: userset; mso-width-alt: 7606;" width="208"> <col style="width: 39pt; mso-width-source: userset; mso-width-alt: 1901;" width="52"> <col style="width: 74pt; mso-width-source: userset; mso-width-alt: 3620;" width="99"> <col style="width: 36pt; mso-width-source: userset; mso-width-alt: 1755;" width="48"> <tbody>
</tbody>


Some of the data uploads fine...things like Title, name, email address etc. However the items where it is trying to match what is uploaded to items in the database are proving very difficult.

For example. The University of Birmingham can be expressed on the spreadsheet as: 'The University of Birmingham' 'University of Birmingham' 'Birmingham University' ' Birmingham'.

At the moment unless the person creating the spreadsheet puts a value that is exactly the same as what is held in the database the data will not upload and you end up checking the dtatabase against every entry on the spreadsheet. By the time you have done this you may as well have entered the data manually row by row! :( It is made even worse when uploading the degree because each University has its own way of expressing each degree class. i.e. some will have BA other will say All bachelors.

I would ideally like to create a spreadsheet where the adminstrator can put their data in and press a button or a macro which would cleanse this data in some way or show which data/entries would need to be changed prior to upload. I have no idea how to do this though. I have had a play with data validation but I am not skilled enough to know if this is the right thing to do.

I cannot find any topics on the web which explain in plain english how complex data can be successfully uploaded into a database and I am spending days and days manually cleansing data for people to save them the agony.

I am wondering if any of you intelligent folks can offer any suggestions.

Thanks

Kaylee
 

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
Hi Kaylee,

A technique I sometimes use to do this is to create a "Dictionary" in the contaminated workbook. The dictionary is simply an added worksheet (can be hidden) containing a list of what to find in column A and what to replace the found value with in column B.

You will have to adjust the code sample below to use the correct sheet names and columns. Please see the code comments.

The order that you list the words can be important as well as the length (shortest to longest etc). Leading and trailing spaces may also be necessary in the word list to get 100% results.

Example:

Find Replace

Univ University (no space after Univ)

Birmingham UnivUniversity

You could use a trailing space in the find word to avoid the above problem or add another entry in the list to replace UnivUniversity with University. Of course adding extra entries to undo undesired changes will take longer.

Hope it helps.

Gary


Code:
Public Sub Test()

Dim sFind As String
Dim sReplace As String
Dim oDictionary As Worksheet
Dim oData As Worksheet
Dim oWordList As Range
Dim oCell As Range

Dim oSearch As Range

Set oData = ThisWorkbook.Worksheets("Data") ' Change "Data" to name of sheet to be searched

'Create worksheet named "Dictionary" list words or phrases to be changed in column A
'list replacement word or phrase in column B
Set oDictionary = ThisWorkbook.Worksheets("Dictionary")
Set oWordList = oDictionary.Range("A1:A" & oDictionary.Range("A" & Rows.Count).End(xlUp).Row)

Set oSearch = ThisWorkbook.Worksheets("Data").Range("F:F") ' Change F:F to desired column

For Each oCell In oWordList

    sFind = oCell.Text
    sReplace = oCell.Offset(0, 1).Text

    oSearch.Replace what:=sFind, replacement:=sReplace, Lookat:=xlPart, MatchCase:=False
    
Next oCell

End Sub
 
Upvote 0
Hello - I only have very basic excel skills so would appreciate your help.

I run a database which is used by Universities to order items of clothing for their academic staff at graduation ceremonies.

In my database I have over 400 Universities and all of these have their own set of awards/degrees i.e. Bachelors, Masters, PhDs etc.

Universty administrators have th abiltiy to upload spreadsheets with this data into a web database to make their ordres. They could have a spreadhseet with anywhere between 10 and 2000 rows. They match the fields on the web to the columns in the spreadsheet and upload. Sounds simple.

The spreadsheets typically have the following format:

Username
Title
First Name
Last Name
Department
Institution at which you studied
Degree
Year of Degree
Height
staff1@staff.ac.uk
Mr.
John
Brown
English
Birmingham University
BA
1985
6' 0"
staff1@staff.ac.uk
Ms.
Charlotte
Green
Science
University of Exeter
BSc
1975
5' 4"
staff1@staff.ac.uk
Prof.
Jenny
Black
Law
Yale
LLM
1999
5' 2"
staff1@staff.ac.uk
Dr.
Bob
White
Languages
Queen Mary London
PhD
2011
5' 10"
staff1@staff.ac.uk
Mr.
Harry
Red
English
Cambridge University
MA
2005
5'11"
staff1@staff.ac.uk
Dr.
James
Grey
Science
University of Durham
PDPhil
2003
6' 2"
staff1@staff.ac.uk
Mrs.
Debby
Blue
Law
Oxford
LLM
1984
5' 7"

<tbody>
</tbody>


Some of the data uploads fine...things like Title, name, email address etc. However the items where it is trying to match what is uploaded to items in the database are proving very difficult.

For example. The University of Birmingham can be expressed on the spreadsheet as: 'The University of Birmingham' 'University of Birmingham' 'Birmingham University' ' Birmingham'.

At the moment unless the person creating the spreadsheet puts a value that is exactly the same as what is held in the database the data will not upload and you end up checking the dtatabase against every entry on the spreadsheet. By the time you have done this you may as well have entered the data manually row by row! :( It is made even worse when uploading the degree because each University has its own way of expressing each degree class. i.e. some will have BA other will say All bachelors.

I would ideally like to create a spreadsheet where the adminstrator can put their data in and press a button or a macro which would cleanse this data in some way or show which data/entries would need to be changed prior to upload. I have no idea how to do this though. I have had a play with data validation but I am not skilled enough to know if this is the right thing to do.

I cannot find any topics on the web which explain in plain english how complex data can be successfully uploaded into a database and I am spending days and days manually cleansing data for people to save them the agony.

I am wondering if any of you intelligent folks can offer any suggestions.

Thanks

Kaylee
Try the add in fuzzy lookup for excel from Microsoft http://www.microsoft.com/en-us/download/details.aspx?id=15011
 
Upvote 0

Forum statistics

Threads
1,214,833
Messages
6,121,868
Members
449,054
Latest member
juliecooper255

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top