Saturday, 16 January 2016

cheminformatics - Generating mol files from a molecular structure image?


I have a question regarding mol files.For example I have this molecule-


enter image description here


The mol file for this is-


  Sample

22 23 0 0 0 0 0 0 0 0999 V2000
-2.5962 2.2535 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.5962 0.7512 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2958 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.0000 0.7512 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 2.2535 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2958 3.0047 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-4.6479 1.7042 0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0
1.3005 3.0047 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-1.2958 4.5070 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2958 -1.5023 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.3005 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3005 -1.5023 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.6009 -2.2535 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

3.8967 -1.5023 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.8967 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.6009 0.7512 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.1972 -2.2535 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
2.6009 -3.7559 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.8967 3.0047 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.1972 3.7559 0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0
-3.1455 4.3052 0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0
3.9014 -4.5070 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0

2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
5 6 1 0 0 0 0
1 6 1 0 0 0 0
1 19 1 0 0 0 0
5 8 2 0 0 0 0
6 9 1 0 0 0 0
3 10 2 0 0 0 0
4 11 1 0 0 0 0

11 12 1 0 0 0 0
12 13 2 0 0 0 0
13 14 1 0 0 0 0
14 15 2 0 0 0 0
15 16 1 0 0 0 0
11 16 2 0 0 0 0
14 17 1 0 0 0 0
13 18 1 0 0 0 0
19 7 1 0 0 0 0
19 20 1 0 0 0 0

19 21 1 0 0 0 0
18 22 2 0 0 0 0
M END

So,I have read that lines 5-26 are for atoms.first three columns represent x,y and z coordinates of atoms.So,my question is how are the coordinates of atoms decided?What is taken as the origin?Also,the atoms are large,so they may occupy more than one coordinate,so how do we decide what x,y,z value to take?Also,is H atoms not taken into account for creating mol files?



Answer



The MOL format specification doesn't specify an origin, so presumably any origin can be chosen, as long as the coordinate values are within the allowed range. (The format allows a range of roughly -10,000 to +100,000, but I don't know if all programs support that large a range.) For example, in one of the examples in the specification the atom closest to the origin is over 17.6 units away. Where the origin is actually set would depend on the software used to create the file.


The coordinates for each atom specify the nucleus of the atom. I think you have a misconception about atomic size. The size of an atom, and thus the space it takes up, can be defined in different ways. The representation of an atom as a sphere (with the nucleus at the center) is a convention, only approximating reality, and the size of the sphere is arbitrary (but usually based on measurable properties). The nucleus is the only point in the atom that has a well-defined location, and thus the only part that meaningful coordinates can be defined for.


In MOL files, the presence of hydrogen atoms can be implied from valence and number of other bonds. (They can also be specified explicitly if needed.) In the long set of zeros at the end of each atom line (just after the element), the fourth zero specifies to add however many single bonds to hydrogen atoms are needed for the atom's valence. The sixth zero indicates to use the element's normal valence. For example, the carbon of the $\mathrm{CH_3}$ group has only one explicit bond, so three bonds to hydrogen are implied by carbon's normal valence of 4. How implicit hydrogens are handled depends on the software (e.g., this molecule has four implicit hydrogens that, per the usual convention for skeletal formulas, are not shown in your image).


The specification for the MOL format is available from Accelrys. The this answer is based on the specification and my knowledge of chemistry. Also, it only describes the MOL V2000 format, which is what you sample file is (based on the header); there is a newer V3000 format.



No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...