CSV with HTML content - parts of HTML outside cell in new lines

vanaspati

New Member
Joined
Dec 22, 2016
Messages
1
I have exported a CSV from a tool with data of around 766 items. There are a lot of columns but only column for Description, which contains the HTML markup (html with some js and css) with lots of double and single quotes.

I am going to use a formula to extract some parts from the HTML Description:
=MID(B2,SEARCH("startstring",B2)+36,SEARCH("endstring",B2)-SEARCH("startstring",B2)+37)

Let's ignore the above formula for now. Right now I am seeing an issue when opening the file in Excel which is to say in simple terms, some parts/markup are outside of the cell of that particular row. It is true for quite a few rows.

I have attached an image as well.
https://i.imgur.com/7K3pwZ3.jpg

As you can see in the image above, the first 2 row of items are fine, but after the 3rd the code gets outside the cell and creates new lines of it's own. Is there any way to resolve this as this happens for quite a few items?
 

Excel Facts

Excel motto
Not everything I do at work revolves around Excel. Only the fun parts.
Hi vanaspati,

excel might have a hard time importing the CSV when there is all kinds of characters like tabs, quotes etc in there. What you could do is either try to fiddle with the import (there are some settings you could try), trying to set up the export so it will simply export plain text. The third option is to run VBA to clean the file before you import it. Below is a function I created to clean a CSV file that occasionally would have some weird stuff in it, hope it's helpful.

Cheers,

Koen

Code:
Sub FixNamedRiskFile(FlNm As String)
    
    'Macro cleans up Named Risks File so it can be imported
    Dim fso As Object, tsIn As Object, tsOut As Object
    Dim TheLine As String, PathIn As String, PathOut As String
    Dim arr As Variant
    
    varReturn = SysCmd(acSysCmdSetStatus, "Cleaning file")
    PathIn = FlNm
    PathOut = FlNm & "new"
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set tsIn = fso.OpenTextFile(PathIn)
    Set tsOut = fso.CreateTextFile(PathOut, True)
    
    n = 0
    Do Until tsIn.AtEndOfStream
        sTemp = tsIn.ReadLine
        'Cleaning the field Named_risk, as it might contain enters, ; and other unwanted characters
        If Len(sTemp) < 290 And Len(sTemp) <> 209 Then
            'Merge 2 lines, as there seems a linebreak in the field
            Debug.Print "EXTRA LINE"
            Debug.Print Len(sTemp)
            Debug.Print sTemp
            Debug.Print sTemp2
            sTemp2 = tsIn.ReadLine
            sTemp = sTemp & sTemp2
        End If
        
        'Replace ; in name field
        If InStr(Mid(sTemp, 39, 76), ";") > 0 Then
            sTemp2 = Left(sTemp, 38) & Replace(Mid(sTemp, 39, 76), ";", "-") & Right(sTemp, Len(sTemp) - 118)
            sTemp = sTemp2
        Else
        End If
        
        'Get rid of ", ', and `
        sTemp = Replace(sTemp, Chr(34), "~")
        sTemp = Replace(sTemp, Chr(4), "0")
        sTemp = Replace(sTemp, "'", "~")
        sTemp = Replace(sTemp, "`", "~")
        
        'Write result
        tsOut.WriteLine sTemp
    Loop
    
    tsIn.Close
    tsOut.Close
    varReturn = SysCmd(acSysCmdSetStatus, " ")
    
    'Rename the file and throw away the old one
    Kill PathIn
    Name PathOut As PathIn
    
    Set tsIn = Nothing
    Set tsOut = Nothing
    Set fso = Nothing
    
End Sub
 
Upvote 0

Forum statistics

Threads
1,214,845
Messages
6,121,902
Members
449,053
Latest member
Guy Boot

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top