SNP Excel tough one

GreenSlugg

New Member
Joined
Nov 19, 2011
Messages
7
Here is a challenge, that can probably only be solved by an Excel expert. Can this be organized into rows and columns properly? The below data is from 3 SNPs. I have a text document with just over 148,000 of these. I would like the below data to be turned into 3 rows, with as many columns as is necessary for organization. Can this be done?


<ExchangeSet>
<Report>
<Rs rsId="121913193" snpClass="snp" snpType="notwithdrawn" molType="genomic" bitField="050060080001040010100100" taxId="9606">
<Het type="est" value="0.02" stdError="0.1058"/>
<Validation byCluster="true" byFrequency="true" by1000G="true"/>
<Create build="133" date="2010-11-22 14:03"/>
<Update build="134" date="2011-10-06 16:25"/>
<Sequence exemplarSs="275515011" ancestralAllele="C,C">
<Seq5>TCCAGCCCTGGTGTTTCCCATCGGGGGCCATGTGGCCTGGAACACAGAGGCTGGGCTGCCCAAGGACAGGCCCCCTGGCCTACAAAAAGGACAGTCTTATCACAGATGTGCAATCCTTGGCACTTCCCTGTGGCGTCTGCAGTTCTGAGACTGATTTTCTTCTATAAATGTGAAAAGATAGGGGATGATAGGAAACCCGGATATAAGGCAGAACAATGTTGCTTGGGCCATCTCTTACACCTCAGTGAAA</Seq5>
<Observed>C/T</Observed>
<Seq3>GGAAAATGAAGGAAATGGAAGGCTGGCATGGGAACCCCGACTTATACAGTGATGCCCTTTTGCTGTTCCACGCGTCGCTTGGGGTGGAAGCCCTTCCTTGTCCCCTCGCCACCCACCCCGGAGCCCCTTGGTGCTTCCTTTCTGAACTGAATGCTTAGACTGGGGAACTAGAGGTGCCCGGAAAGGGAAGTCGGGAAGAAGAGACTATTCCTGTTGGTTCCTCCCCAGAGATGGGAGATGGTGAACAGGC</Seq3>
</Sequence>
<Ss ssId="275515011" handle="GEH" batchId="1052340" locSnpId="NM_005534.3:c.74-3673C>T" subSnpClass="snp" orient="forward" strand="bottom" molType="genomic" buildId="133" methodClass="sequence" validated="by-submitter">
<Sequence>
<Seq5>TCCAGCCCTGGTGTTTCCCATCGGGGGCCATGTGGCCTGGAACACAGAGGCTGGGCTGCCCAAGGACAGGCCCCCTGGCCTACAAAAAGGACAGTCTTATCACAGATGTGCAATCCTTGGCACTTCCCTGTGGCGTCTGCAGTTCTGAGACTGATTTTCTTCTATAAATGTGAAAAGATAGGGGATGATAGGAAACCCGGATATAAGGCAGAACAATGTTGCTTGGGCCATCTCTTACACCTCAGTGAAA</Seq5>
<Observed>C/T</Observed>
<Seq3>GGAAAATGAAGGAAATGGAAGGCTGGCATGGGAACCCCGACTTATACAGTGATGCCCTTTTGCTGTTCCACGCGTCGCTTGGGGTGGAAGCCCTTCCTTGTCCCCTCGCCACCCACCCCGGAGCCCCTTGGTGCTTCCTTTCTGAACTGAATGCTTAGACTGGGGAACTAGAGGTGCCCGGAAAGGGAAGTCGGGAAGAAGAGACTATTCCTGTTGGTTCCTCCCCAGAGATGGGAGATGGTGAACAGGC</Seq3>
</Sequence>
</Ss>
<Ss ssId="340995776" handle="1000GENOMES" batchId="1056111" locSnpId="20100804_snps_12639317_chr21_34783522" subSnpClass="snp" orient="forward" molType="genomic" buildId="134" methodClass="sequence">
<Sequence>
<Seq5>ATATAAGGCAGAACAATGTTGCTTGGGCCATCTCTTACACCTCAGTGAAA</Seq5>
<Observed>C/T</Observed>
<Seq3>GGAAAATGAAGGAAATGGAAGGCTGGCATGGGAACCCCGACTTATACAGT</Seq3>
</Sequence>
</Ss>
<Assembly dbSnpBuild="135" genomeBuild="37_3" groupLabel="GRCh37.p5" current="true" reference="true">
<Component componentType="contig" accession="NT_011512.11" chromosome="21" start="14338129" end="42955558" orientation="fwd" gi="224514633" groupTerm="NC_000021.8" contigLabel="GCF_000001405.17">
<MapLoc asnFrom="20445392" asnTo="20445392" locType="exact" alnQuality="1" orient="forward" physMapInt="34783521" leftFlankNeighborPos="0" rightFlankNeighborPos="0" leftContigNeighborPos="20445391" rightContigNeighborPos="20445393">
<FxnSet geneId="3460" symbol="IFNGR2" mrnaAcc="NM_005534" mrnaVer="3" fxnClass="intron-variant"/>
</MapLoc>
</Component>
<SnpStat mapWeight="unique-in-contig" chromCount="1" placedContigCount="1" unplacedContigCount="0" seqlocCount="1" hapCount="0"/>
</Assembly>
<Assembly dbSnpBuild="135" genomeBuild="37_3" groupLabel="HuRef" current="true">
<Component componentType="contig" accession="NW_001838706.1" chromosome="21" start="570586" end="24462175" orientation="fwd" gi="157697894" groupTerm="AC_000153.1" contigLabel="GCF_000002125.1">
<MapLoc asnFrom="19681727" asnTo="19681727" locType="exact" alnQuality="1" orient="forward" physMapInt="20252313" leftFlankNeighborPos="0" rightFlankNeighborPos="0" leftContigNeighborPos="19681726" rightContigNeighborPos="19681728">
<FxnSet geneId="3460" symbol="IFNGR2" mrnaAcc="NM_005534" mrnaVer="3" fxnClass="intron-variant"/>
</MapLoc>
</Component>
<SnpStat mapWeight="unique-in-contig" chromCount="1" placedContigCount="1" unplacedContigCount="0" seqlocCount="1" hapCount="0"/>
</Assembly>
<PrimarySequence dbSnpBuild="135" gi="187423893" source="remap" accession="NG_007570.1">
<MapLoc asnFrom="13320" asnTo="13320" locType="exact" alnQuality="1" orient="forward" leftContigNeighborPos="13319" rightContigNeighborPos="13321"/>
</PrimarySequence>
<RsLinkout resourceId="1" linkValue="121913193"/>
<hgvs>NC_000021.8:g.34783522C>T</hgvs>
<hgvs>NG_007570.1:g.13321C>T</hgvs>
<hgvs>NM_005534.3:c.74-3673C>T</hgvs>
<Frequency freq="0.0119" allele="T" sampleSize="2188"/>
</Rs>
</Report>

<Report>
<Rs rsId="118191823" snpClass="snp" snpType="notwithdrawn" molType="genomic" bitField="050000000004040000100100" taxId="9606">
<Het type="est" value="0.05" stdError="0.1483"/>
<Validation byFrequency="true" by1000G="true"/>
<Create build="132" date="2010-07-15 11:22"/>
<Update build="132" date="2011-09-09 11:56"/>
<Sequence exemplarSs="237932818" ancestralAllele="T,T,T">
<Seq5>TAGAAGTGTTCATTAAAAGAAAAAGTTACAGAATCATATACTTGAGACATTTTTGGAAAAATAATATATAAGCATAAGTTGTCTTTTGCCTATTTATTAAAGCCATTCTAGTGTTTGAGAAGTGGTGTTTCTTGTTTTCATGTGCATTTTCATAGTGGATAGTGATGTTGGGCATTTTTTATATGATTATTAGCCATATA</Seq5>
<Observed>C/T</Observed>
<Seq3>AATTCTTCTTTGAAGAAAGAGCTATTCAAATCCACTATACTTTTTGATTTATTTTCTTTTACTTTTTTTTAATTGGGGATTTTCTATTCATGTTATGATTTGAAATCATTCAGGACTTTGTTTCTTTTGATGCTTATAATTCTTTAACCATTTCAGAACTGACTGATGCATTTATAATGCTACAACAGAATCTTATTAGG</Seq3>
</Sequence>
<Ss ssId="237932818" handle="1000GENOMES" batchId="1056111" locSnpId="pilot_1_CEU_7537447_chr21_22395106" subSnpClass="snp" orient="forward" molType="genomic" buildId="132" methodClass="sequence">
<Sequence>
<Seq5>TAGAAGTGTTCATTAAAAGAAAAAGTTACAGAATCATATACTTGAGACATTTTTGGAAAAATAATATATAAGCATAAGTTGTCTTTTGCCTATTTATTAAAGCCATTCTAGTGTTTGAGAAGTGGTGTTTCTTGTTTTCATGTGCATTTTCATAGTGGATAGTGATGTTGGGCATTTTTTATATGATTATTAGCCATATA</Seq5>
<Observed>C/T</Observed>
<Seq3>AATTCTTCTTTGAAGAAAGAGCTATTCAAATCCACTATACTTTTTGATTTATTTTCTTTTACTTTTTTTTAATTGGGGATTTTCTATTCATGTTATGATTTGAAATCATTCAGGACTTTGTTTCTTTTGATGCTTATAATTCTTTAACCATTTCAGAACTGACTGATGCATTTATAATGCTACAACAGAATCTTATTAGG</Seq3>
</Sequence>
</Ss>
<Assembly dbSnpBuild="135" genomeBuild="37_3" groupLabel="GRCh37.p5" current="true" reference="true">
<Component componentType="contig" accession="NT_011512.11" chromosome="21" start="14338129" end="42955558" orientation="fwd" gi="224514633" groupTerm="NC_000021.8" contigLabel="GCF_000001405.17">
<MapLoc asnFrom="9135105" asnTo="9135105" locType="exact" alnQuality="1" orient="forward" physMapInt="23473234" leftContigNeighborPos="9135104" rightContigNeighborPos="9135106"/>
</Component>
<SnpStat mapWeight="unique-in-contig" chromCount="1" placedContigCount="1" unplacedContigCount="0" seqlocCount="1" hapCount="0"/>
</Assembly>
<RsLinkout resourceId="1" linkValue="118191823"/>
<hgvs>NC_000021.8:g.23473235T>C</hgvs>
<Frequency freq="0.0215" allele="C" sampleSize="2188"/>
</Rs>
</Report>

<Report>
<Rs rsId="118189973" snpClass="snp" snpType="notwithdrawn" molType="genomic" bitField="050000000004040000100100" taxId="9606">
<Het type="est" value="0.05" stdError="0.1483"/>
<Validation byFrequency="true" by1000G="true"/>
<Create build="132" date="2010-07-15 11:22"/>
<Update build="132" date="2011-09-09 11:56"/>
<Sequence exemplarSs="237966787" ancestralAllele="T,T,T">
<Seq5>AACCTAGGTTGCCAGGGAGAGACTGAATCTGGGGAGAAAACACTAGTCAGGACAGTGAATTCTAAATAGTTTTCTTTTGTATTTTCCTCAGTGATTTCCATCTCTCCATTTGTTGACTCCTTCTAACTCCAGGCTCTTTGGATAAATTTGTTTCAATATCTGGAATCCTCAAGGGGATGGAGAGAACCAAGAATGTAGGA</Seq5>
<Observed>C/T</Observed>
<Seq3>TGAGAAGGTGGGGATGAGGATGAGGCTGTCTTCCTCCTATGAGCATGAGTCAGGATACCTGCTAGAACCTCCAGGAGGGAACCAGTGGTACAGGGAGCAGGAGAGCCCAGTGGGCAGCACAGCAAAGCATTTGTCTCTCTCTGTCTCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAACCTCCCACTCCTA</Seq3>
</Sequence>
<Ss ssId="237966787" handle="1000GENOMES" batchId="1056111" locSnpId="pilot_1_CEU_7571416_chr21_33096855" subSnpClass="snp" orient="forward" molType="genomic" buildId="132" methodClass="sequence">
<Sequence>
<Seq5>AACCTAGGTTGCCAGGGAGAGACTGAATCTGGGGAGAAAACACTAGTCAGGACAGTGAATTCTAAATAGTTTTCTTTTGTATTTTCCTCAGTGATTTCCATCTCTCCATTTGTTGACTCCTTCTAACTCCAGGCTCTTTGGATAAATTTGTTTCAATATCTGGAATCCTCAAGGGGATGGAGAGAACCAAGAATGTAGGA</Seq5>
<Observed>C/T</Observed>
<Seq3>TGAGAAGGTGGGGATGAGGATGAGGCTGTCTTCCTCCTATGAGCATGAGTCAGGATACCTGCTAGAACCTCCAGGAGGGAACCAGTGGTACAGGGAGCAGGAGAGCCCAGTGGGCAGCACAGCAAAGCATTTGTCTCTCTCTGTCTCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACTGCAACCTCCCACTCCTA</Seq3>
</Sequence>
</Ss>
<Assembly dbSnpBuild="135" genomeBuild="37_3" groupLabel="GRCh37.p5" current="true" reference="true">
<Component componentType="contig" accession="NT_011512.11" chromosome="21" start="14338129" end="42955558" orientation="fwd" gi="224514633" groupTerm="NC_000021.8" contigLabel="GCF_000001405.17">
<MapLoc asnFrom="19836855" asnTo="19836855" locType="exact" alnQuality="1" orient="forward" physMapInt="34174984" leftContigNeighborPos="19836854" rightContigNeighborPos="19836856">
<FxnSet geneId="56245" symbol="C21orf62" mrnaAcc="NM_019596" mrnaVer="5" fxnClass="intron-variant"/>
<FxnSet geneId="56245" symbol="C21orf62" mrnaAcc="NM_001162495" mrnaVer="2" fxnClass="intron-variant"/>
<FxnSet geneId="56245" symbol="C21orf62" mrnaAcc="NM_001162496" mrnaVer="2" fxnClass="intron-variant"/>
</MapLoc>
</Component>
<SnpStat mapWeight="unique-in-contig" chromCount="1" placedContigCount="1" unplacedContigCount="0" seqlocCount="1" hapCount="0"/>
</Assembly>
<RsLinkout resourceId="1" linkValue="118189973"/>
<hgvs>NM_001162495.2:c.-65+4967A>G</hgvs>
<hgvs>NM_019596.5:c.-65+7899A>G</hgvs>
<hgvs>NM_001162496.2:c.-64-8189A>G</hgvs>
<hgvs>NC_000021.8:g.34174985T>C</hgvs>
<Frequency freq="0.0133" allele="C" sampleSize="2188"/>
</Rs>
</Report>
 
I was able to open it without a schema both in Excel and another application.

It kind of worked but the structure was obviously not quite right and something, eg the schema, was missing.

As to what to use for a schema, I don't know.

You could try making a copy, renaming with the xml extension and opening it in Excel.

I'm not sure if that can be done in earlier versions though.

I found a way to actually organize the data once I import it into Excel. It is kind of complicated, but I have actually found a way to organize it once it is actually in Excel.

Thank you for your time. I highly appreciate all of your attention, and taking the time out of your day to work with me. One more question. The data that I have to import takes up more than the ~1,000,000 rows that Excel allows. I can do this with no problem if it will import the 2nd million into a second worksheet, the third million into a third etc.

I had gotten a prompt from Excel to teach me how to do this on multiple occasions, but I did not need it yet, so I did not follow it ... unfortunately the prompt is no longer there. Can you tell me how to do this?
 
Upvote 0

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
Glad you found a solution.

As for the 2 million rows, there are various ways to do that.

What was suggested by the prompt?

Can you recall anything about it?

How are you currently dealing with the 1 million?

There could be something you could do there.
 
Upvote 0

Forum statistics

Threads
1,216,166
Messages
6,129,257
Members
449,497
Latest member
The Wamp

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top