码迷,mamicode.com
首页 > 其他好文 > 详细

Molecule to atoms

时间:2015-07-10 00:11:30      阅读:152      评论:0      收藏:0      [点我收藏+]

标签:

For a given chemical formula represented by a string, count the number of atoms of each element contained in the molecule and return an object.

 1 water = H2O 
 2 parse_molecule(water) 
 3 # return {H: 2, O: 1} 
 4 
 5 magnesium_hydroxide = Mg(OH)2 parse_molecule(magnesium_hydroxide) 
 6 # return {Mg: 1, O: 2, H: 2}
 7 
 8 var fremy_salt = K4[ON(SO3)2]2 
 9 parse_molecule(fremySalt) 
10 # return {K: 4, O: 14, N: 2, S: 4}

这个题大意就是将分子表达式转化成原子(词典表示),在codewars上难度是3kyu,难点在于各种条件的分析,防止越界,还有分子式中的各种限制。

我的思路大概就是把方括号,大括号都转换成括弧先,依次把最内层,然后外层的括弧展开,最后得到一个没有括弧的表达式,这就很好处理了。这里有个寻找最内层括弧的问题,我的理解是,先找到第一个‘)‘,然后往前找与之对应的‘(‘,用展开后的结果代替‘(...)2‘,我用2代替括弧后面的数字,有可能这个数字是1,自然就省略了,我们要在转换过程中把1补上。在最后的处理中,我们也要注意1是被省略的,需要计算时加上。

代码如下:

 1 def parse_molecule(formula):
 2     formula_dict = {}
 3     #替换[]{}为()
 4     for bracket in [{:
 5         formula = formula.replace(bracket, ()
 6     for bracket in ]}:
 7         formula = formula.replace(bracket, ))
 8     
 9     if ( in formula:
10         has_bracket = True
11     else:
12         has_bracket = False
13     while has_bracket:
14         #寻找最内层的()
15         for i in range(len(formula)):
16             if formula[i] == ):
17                 break
18         for j in range(len(formula[:i])-1, -1, -1):
19             if formula[j] == (:
20                 break
21         #如果有省略的1,补上
22         if i+1 == len(formula) or not formula[i+1].isdigit():
23             sub_formula = formula[j: i+1]
24             #为了防止后面的replace出错,设置了临时变量,否则
25             #如果直接sub_formula = formula[j: i+1] + ‘1‘
26             #sub_formula变成了formula中没有的子串,就不执行
27             #这个循序就会一直进行下去
28             tmp = sub_formula + 1
29         else:
30             sub_formula = formula[j: i+2]
31             tmp = sub_formula
32         parsed_sub_formula = parse_paren(tmp)
33         formula = formula.replace(sub_formula, parsed_sub_formula)
34         if ( in formula:
35             has_bracket = True
36         else:
37             has_bracket = False
38     #处理没有()的分子表达式
39     i = 0
40     while i < len(formula):
41         j = i+1
42         if j < len(formula) and formula[j].islower():
43             j += 1
44         tmp = formula[i: j]
45         #注意边界的处理防止j越界
46         #我这里有个小bug,我假设的是原子下标都是最多两位,如果出现三位
47         #就会把第三位当做一个元素且下标为1
48         #没想到也通过了
49         if j < len(formula) and formula[j].isdigit():
50             k = j+1
51             if k < len(formula) and formula[k].isdigit():
52                 formula_dict[tmp] = formula_dict.get(tmp, 0) + int(formula[j: k+1])
53                 i = k+1
54             else:
55                 formula_dict[tmp] = formula_dict.get(tmp, 0) + int(formula[j])
56                 i = j+1
57         elif j < len(formula) and formula[j].isupper():
58             formula_dict[tmp] = formula_dict.get(tmp, 0) + 1
59             i = j
60         elif j == len(formula):
61             formula_dict[tmp] = formula_dict.get(tmp, 0) + 1
62             break
63 
64     return formula_dict
65 
66 def parse_paren(sub_formula):
67     result = {}
68     times = int(sub_formula[-1])
69     i = 1
70     while i < len(sub_formula)-2:
71         j = i+1
72         if sub_formula[j].islower():
73             j += 1
74         tmp = sub_formula[i: j]
75         if sub_formula[j].isdigit():
76             k = j+1
77             #此处也是假设原子下标为最多两位
78             if k < len(sub_formula)-2 and sub_formula[k].isdigit():
79                 result[tmp] = result.get(tmp, 0) + int(sub_formula[j: k+1])*times
80                 i = k+1
81             else:
82                 result[tmp] = result.get(tmp, 0) + int(sub_formula[j])*times
83                 i = j+1
84         elif sub_formula[j].isupper() or sub_formula[j] == ):
85             result[tmp] = result.get(tmp, 0) + 1*times
86             i = j
87 
88     t = []
89     for key, val in result.iteritems():
90         t.append(key)
91         t.append(str(val))
92     return ‘‘.join(t)
93 
94 #测试的时候故意加了一些乱七八糟的分子表达式,但还符合规则
95 print parse_molecule(K4[ON(SO3)2]2)
96 print parse_molecule((H2O)H10)
97 print parse_molecule((OH123)2)    

虽然也通过了,但是代码中的bug有时间再改(不知何时了,反正被折磨得够呛,下次下次......水平太差了)

不过好像用正则表达式更好的样子,那就stay tuned...

 

Molecule to atoms

标签:

原文地址:http://www.cnblogs.com/FARAMIR/p/4634470.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!