Listing multiple diseases for a single patient

MimiM

New Member
Joined
Apr 23, 2018
Messages
24
Hi

I have an SQL extract table of patients with one or more diseases recorded for them. Each patient/disease information is a separate record/line. Through a pivot, I can see which patient has more than one disease recorded and which diseases are included. However, the pivot contains all possible diseases (columns) and all patients (rows).

My aim is to list each individual patient, with only the diseases recorded against them as columns. I would then try to summarise the combinations of diseases.

Any ideas, please?

Many thanks in advance

Mimi

Structure;
Patient 1 Disease A Practice X
Patient 1 Disease F Practice X
Patient 2 Disease B Practice Y
Patient 2 Disease G Practice Y
Patient 2 Disease H Practice Y
Patient 3 Disease A Practice Z
Patient 3 Disease C Practice Z

Aiming at

Patient 1 Practice X Disease A Disease F
Patient 2 Practice Y Disease B Disease G Disease H
Patient 3 Practice Z Disease A Disease C

etc.
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.

Book1
ABCDEFGHIJ
1
2Patient 1Disease APractice XPatient 1Practice XDisease ADisease F
3Patient 1Disease FPractice XPatient 2Practice YDisease BDisease GDisease H
4Patient 2Disease BPractice YPatient 3Practice ZDisease ADisease C
5Patient 2Disease GPractice Y
6Patient 2Disease HPractice Y
7Patient 3Disease APractice Z
8Patient 3Disease CPractice Z
9
Sheet1


In E2 control+shift+enter, not just enter, and copy down:

=IFERROR(INDEX($A$2:$A$8,SMALL(IF(FREQUENCY(IF(1-($A$2:$A$8=""),MATCH($A$2:$A$8,$A$2:$A$8,0)),ROW($A$2:$A$8)-ROW($A$2)+1),ROW($A$2:$A$8)-ROW($A$2)+1),ROWS($E$2:E2))),"")

In F2 just enter and copy down:

=IF($E2="","",VLOOKUP($E2,$A$2:$C$8,3,0))

In G2 control+shift+enter, not just enter, copy across, and down:

=IFERROR(IF($E2="","",INDEX($B$2:$B$8,SMALL(IF($A$2:$A$8=$E2,ROW($A$2:$A$8)-ROW($A$2)+1),COLUMNS($G2:G2)))),"")
 
Upvote 0
another possible presentation is to use a Pivot Table as follows

Data Range
A
B
C
D
E
F
G
H
I
J
K
L
1
Column1​
Column2​
Column3​
Count of Column2​
Column Labels​
2
Patient 1​
Disease A​
Practice X​
Row Labels​
Disease A​
Disease B​
Disease C​
Disease F​
Disease G​
Disease H​
3
Patient 1​
Disease F​
Practice X​
Patient 1​
4
Patient 2​
Disease B​
Practice Y​
Practice X​
1​
1​
5
Patient 2​
Disease G​
Practice Y​
Patient 2​
6
Patient 2​
Disease H​
Practice Y​
Practice Y​
1​
1​
1​
7
Patient 3​
Disease A​
Practice Z​
Patient 3​
8
Patient 3​
Disease C​
Practice Z​
Practice Z​
1​
1​
9
10
 
Upvote 0
Aladin's solution worked as needed, thanks again.

One step further now - how do I determine the most common combinations of diseases? There are over 16,000 patients with more than one disease recorded against them, from 2 to 13. I would be looking for, say, the top 5 combinations, e.g., Hypertension + Diabetes; Hypertension + Diabetes + Asthma. of course, the way the diseases come from the database in a different order, so the same one will not be in the same place for different patients. Ideally, I'd like to have them all ordered in the same way so that I can see the combinations somehow.

Is it possible at all?


Mimi
 
Upvote 0
And what I'm seriously struggling with is the combination of index-frequency. The explanations and examples of Frequency use somehow don't seem to fit...
 
Upvote 0
Aladin's solution worked as needed, thanks again.

Glad to help.

One step further now - how do I determine the most common combinations of diseases? There are over 16,000 patients with more than one disease recorded against them, from 2 to 13. I would be looking for, say, the top 5 combinations, e.g., Hypertension + Diabetes; Hypertension + Diabetes + Asthma. of course, the way the diseases come from the database in a different order, so the same one will not be in the same place for different patients. Ideally, I'd like to have them all ordered in the same way so that I can see the combinations somehow.

Is it possible at all?


Mimi


Care to elaborate what you mean using the earlier sample? Pretend 3 instead of 5...
 
Upvote 0
Imagine the following

Patient number Disease 1 Disease 2 Disease 3
1 Asthma Diabetes
2 Hypertension Asthma Diabetes
3 Diabetes Asthma
4 Diabetes Hypertension
5 Asthma Hypertension Chronic Kidney Failure
6 Hypertension Diabetes Asthma
And this in thousands of patients and up to 13 diseases

Are there any diseases that come up together more often? Which ones? are the questions that I am trying to answer.

From this very simple example, we could see that diabetes and hypertension come up together 3 times, so do diabetes and asthma, so do hypertension and asthma. The combination of the three is next down the list, etc.

It may be a question that is impossible to answer, I don't know.
 
Upvote 0
Your question about the most common combinations seems to be a very complex one due to
a) You seem to be looking within the list of diseases for individual patients and
b) the order of diseases from the data source has no specific structure/order
This, together with the fact that you seem to have quite a big database, means that I suspect you will get no satisfaction from a formula solution.

I have a possible macro solution below but it still does not look within patient records for combinations but only at the full list for each patient. Given that, the macro will list the "top n" combinations. Change the value of n in the 'Const line near the top of the code.

To get to the result about most common combinations, the code also produces the list for each patient like the formula solution already suggested (so this may possibly replace that formula solution) but the macro lists the diseases for each patient in alphabetical order which may be of some use to you if still inspecting the results manually.

I have assumed that the data is in columns A:C and that there is nothing on the worksheet to the right of that.

In my small example, I have the code listing the diseases for each patient and also listing the top 3 combinations of diseases. Note that my result actually lists 4 combinations as there was a tie for the third most common combination.

Code:
Sub Analyse_Diseases()
  Dim d1 As Object, d2 As Object, AL As Object, SL As Object
  Dim a As Variant, itm As Variant
  Dim i As Long, nc As Long, k As Long, lastval As Long
  
  Const TopNum As Long = 3  '<- Top how many combinations to list
  
  Set d1 = CreateObject("Scripting.Dictionary")
  d1.CompareMode = 1
  Set d2 = CreateObject("Scripting.Dictionary")
  d2.CompareMode = 1
  Set AL = CreateObject("System.Collections.ArrayList")
  Set SL = CreateObject("System.Collections.Sortedlist")
  a = Range("A2", Range("C" & Rows.Count).End(xlUp)).Value
  For i = 1 To UBound(a)
    d1(a(i, 1) & ";" & a(i, 3)) = d1(a(i, 1) & ";" & a(i, 3)) & ";" & a(i, 2)
  Next i
  For i = 1 To d1.Count
    AL.Clear
    For Each itm In Split(Mid(d1.Items()(i - 1), 2), ";")
      AL.Add itm
    Next itm
    AL.Sort
    d1(d1.Keys()(i - 1)) = Join(AL.toarray, ";")
  Next i
  Range("E1:G1").Value = Array("Patient", "Practice", "Diseases")
  With Range("E2").Resize(d1.Count)
    .Value = Application.Transpose(d1.Keys)
    .TextToColumns DataType:=xlDelimited, Tab:=False, Semicolon:=True, Comma:=False, Space:=False, Other:=False
    With .Offset(, 2)
      .Value = Application.Transpose(d1.Items)
      .TextToColumns DataType:=xlDelimited, Tab:=False, Semicolon:=True, Comma:=False, Space:=False, Other:=False
    End With
  End With
  With Range("E1").CurrentRegion
    .Columns.AutoFit
    nc = .Column + .Columns.Count + 1
  End With
  For Each itm In d1.Items()
    d2(itm) = d2(itm) + 1
  Next itm
  For Each itm In d2.Keys()
    SL.Add Format(d2(itm), "00000") & "|" & itm, 1
  Next itm
  SL.Add "00000|x", 1
  i = SL.Count
  Do While SL.GetKey(i - 1) <> "00000|x" And (k < TopNum Or Split(SL.GetKey(i - 1), "|")(0) - lastval = 0)
    Cells(Rows.Count, nc).End(xlUp).Offset(1).Resize(, 2).Value = Split(SL.GetKey(i - 1), "|")
    lastval = Split(SL.GetKey(i - 1), "|")(0)
    k = k + 1
    i = i - 1
  Loop
  With Cells(1, nc).Resize(, 2)
    .Value = Array("Count", "Disease Combination")
    .CurrentRegion.Columns(1).TextToColumns DataType:=xlDelimited, Other:=False, FieldInfo:=Array(1, 1)
    .EntireColumn.AutoFit
  End With
End Sub

My sample data (columns A:C) and results of code (Columns E:L)


Book1
ABCDEFGHIJKL
1PatientDiseasePracticePatientPracticeDiseasesCountDisease Combination
2Patient 1AnginaPractice XPatient 1Practice XAnginaAsthma3Measles;Mumps
3Patient 1AsthmaPractice XPatient 2Practice YAnginaBrain CancerInfluenza3Angina;Brain Cancer;Influenza
4Patient 2Brain CancerPractice YPatient 3Practice ZMeaslesMumps2Asthma;Hypertension
5Patient 2InfluenzaPractice YPatient 4Practice XAsthmaHypertension2Angina;Asthma
6Patient 2AnginaPractice YPatient 5Practice XAnginaAsthma
7Patient 3MumpsPractice ZPatient 6Practice XAsthmaHypertension
8Patient 3MeaslesPractice ZPatient 7Practice XMeaslesMumps
9Patient 4AsthmaPractice XPatient 8Practice XAnginaBrain CancerInfluenza
10Patient 4HypertensionPractice XPatient 9Practice XMeaslesMumps
11Patient 5AsthmaPractice XPatient 10Practice XAnginaBrain CancerInfluenza
12Patient 5AnginaPractice X
13Patient 6HypertensionPractice X
14Patient 6AsthmaPractice X
15Patient 7MeaslesPractice X
16Patient 7MumpsPractice X
17Patient 8InfluenzaPractice X
18Patient 8AnginaPractice X
19Patient 8Brain CancerPractice X
20Patient 9MeaslesPractice X
21Patient 9MumpsPractice X
22Patient 10Brain CancerPractice X
23Patient 10InfluenzaPractice X
24Patient 10AnginaPractice X
Sheet1
 
Upvote 0

Forum statistics

Threads
1,214,409
Messages
6,119,339
Members
448,888
Latest member
Arle8907

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top