Hello everyone!
I'm currently trying to make an excel spreadsheet of around 19 000 chemical compounds with their name, formula, mass and database links (CAS number etc.). All the info I need is taken from a text file, which was generated by a script. The text file in question looks like this:
The problem I am facing is that, when it gets imported into excel, the columns get mis-aligned as not all compounds alternative names. The text file also contains entries which lack mass and chemical formula.
This wouldn't normally be a problem if I only had to deal with 10 or so compounds (then I would just fix it manually) but when I have to manage about 19 000 of them...
The end-result I am after looks pretty much like this:
I am guessing I need a macro of some sort to get the result I want. However, I am not very experienced in working with excel or programming in general. I will appreciate any help I can get.
Thanks!
Clanrat
I'm currently trying to make an excel spreadsheet of around 19 000 chemical compounds with their name, formula, mass and database links (CAS number etc.). All the info I need is taken from a text file, which was generated by a script. The text file in question looks like this:
Code:
ENTRY: C00001 Compound NAME: H2O; Water FORMULA: H2O EXACT_MASS: 18.0106 DBLINKS: CAS: 7732-18-5 PubChem: 3303 ChEBI: 15377 PDB-CCD: HOH 3DMET: B01124 NIKKAJI: J43.587B
ENTRY: C00002 Compound NAME: ATP; Adenosine 5'-triphosphate FORMULA: C10H16N5O13P3 EXACT_MASS: 506.9957 DBLINKS: CAS: 56-65-5 PubChem: 3304 ChEBI: 15422 KNApSAcK: C00001491 PDB-CCD: ATP 3DMET: B01125 NIKKAJI: J10.680A
ENTRY: C00003 Compound NAME: NAD+; NAD; Nicotinamide adenine dinucleotide; DPN; Diphosphopyridine nucleotide; Nadide FORMULA: C21H28N7O14P2 EXACT_MASS: 664.1169 DBLINKS: CAS: 53-84-9 PubChem: 3305 ChEBI: 15846 KNApSAcK: C00007256 PDB-CCD: NAD NAJ 3DMET: B01126 NIKKAJI: J136.554A
ENTRY: C00004 Compound NAME: NADH; DPNH; Reduced nicotinamide adenine dinucleotide FORMULA: C21H29N7O14P2 EXACT_MASS: 665.1248 DBLINKS: CAS: 58-68-4 PubChem: 3306 ChEBI: 16908 KNApSAcK: C00019343 PDB-CCD: NAI 3DMET: B01127 NIKKAJI: J213.546I
ENTRY: C00005 Compound NAME: NADPH; TPNH; Reduced nicotinamide adenine dinucleotide phosphate FORMULA: C21H30N7O17P3 EXACT_MASS: 745.0911 DBLINKS: CAS: 2646-71-1 PubChem: 3307 ChEBI: 16474 KNApSAcK: C00019545 PDB-CCD: NDP 3DMET: B01128 NIKKAJI: J208.978E
ENTRY: C00006 Compound NAME: NADP+; NADP; Nicotinamide adenine dinucleotide phosphate; beta-Nicotinamide adenine dinucleotide phosphate; TPN; Triphosphopyridine nucleotide FORMULA: C21H29N7O17P3 EXACT_MASS: 744.0833 DBLINKS: CAS: 53-59-8 PubChem: 3308 ChEBI: 18009 PDB-CCD: NAP 3DMET: B01129 NIKKAJI: J247.824B
ENTRY: C00007 Compound NAME: Oxygen; O2 FORMULA: O2 EXACT_MASS: 31.9898 DBLINKS: CAS: 7782-44-7 PubChem: 3309 ChEBI: 15379 PDB-CCD: OXY 3DMET: B00001 NIKKAJI: J44.420K
ENTRY: C00008 Compound NAME: ADP; Adenosine 5'-diphosphate FORMULA: C10H15N5O10P2 EXACT_MASS: 427.0294 DBLINKS: CAS: 58-64-0 PubChem: 3310 ChEBI: 16761 KNApSAcK: C00019353 PDB-CCD: ADP 3DMET: B01130 NIKKAJI: J10.683F
ENTRY: C00009 Compound NAME: Orthophosphate; Phosphate; Phosphoric acid; Orthophosphoric acid FORMULA: H3PO4 EXACT_MASS: 97.9769 DBLINKS: CAS: 7664-38-2 PubChem: 3311 ChEBI: 18367 KNApSAcK: C00007408 PDB-CCD: 2HP PI PO4 3DMET: B00002 NIKKAJI: J3.746J
ENTRY: C00010 Compound NAME: CoA; Coenzyme A; CoA-SH FORMULA: C21H36N7O16P3S EXACT_MASS: 767.1152 DBLINKS: CAS: 85-61-0 PubChem: 3312 ChEBI: 15346 KNApSAcK: C00007258 PDB-CCD: COA COZ 3DMET: B04618 NIKKAJI: J192.630F
This wouldn't normally be a problem if I only had to deal with 10 or so compounds (then I would just fix it manually) but when I have to manage about 19 000 of them...
The end-result I am after looks pretty much like this:
Code:
ENTRY NAME FORMULA MASS DBLINKS
1 H20; Water H2O 18.0106 CAS: 7732-18-5 PubChem: 3303 ChEBI: 15377 PDB-CCD: HOH 3DMET: B01124 NIKKAJI: J43.587B
Thanks!
Clanrat