-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract some information from results of opt
and scrf
Gaussian jobs
#278
Comments
@fwmeng88 actually, I was just needing to get some geometry information out of the Gaussian log file, so it would be very useful to parse more information out of the log file. As far as I can tell, we don't have any unmerged implementation, so please feel free to share your work. |
Thanks for letting me know. @FarnazH Here is the script that I wrote, import re
import pandas as pd
from iodata.utils import LineIterator
__author__ = "Fanwang Meng @ Ayers Lab"
__date__ = "2021.April.24"
__version__ = "0.0.2"
def extract_qm_log(log_fpath,
tag=None,
output_fname=None):
"""Extract quantum chemical descriptors from Gaussian optimization log file."""
lit = LineIterator(log_fpath)
data_dict = {}
electro_spat_ext = []
nuclear_repulsion_energies = []
# R6Disp: Grimme-D2 Dispersion energy
dispersion_energies = []
# nuclear repulsion after empirical dispersion term
nuclear_repulsion_dispersion = []
while True:
try:
line = next(lit).strip()
except StopIteration:
break
# dipole moment
if line.startswith("Dipole moment"):
line = next(lit).strip()
dipole_list = line.split()
data_dict["dipole_x"] = float(dipole_list[1])
data_dict["dipole_y"] = float(dipole_list[3])
data_dict["dipole_z"] = float(dipole_list[5])
data_dict["dipole_total"] = float(dipole_list[7])
# quadrupole moment
elif line.startswith("Quadrupole moment"):
line = next(lit).strip()
quadropole_list = line.split()
data_dict["quadropole_xx"] = float(quadropole_list[1])
data_dict["quadropole_yy"] = float(quadropole_list[3])
data_dict["quadropole_zz"] = float(quadropole_list[5])
line = next(lit).strip()
quadropole_list = line.split()
data_dict["quadropole_xy"] = float(quadropole_list[1])
data_dict["quadropole_xz"] = float(quadropole_list[3])
data_dict["quadropole_yz"] = float(quadropole_list[5])
# electronic spatial extent (au)
elif line.startswith("Electronic spatial extent"):
electro_spat_ext.append(float(line.split()[-1]))
# this is used to reset the list to be empty to store the last record
elif line.startswith("Population analysis using the SCF Density"):
alpha_occ_eigenvalues = []
alpha_virt_eigenvalues = []
# The last value in the Alphha Occ. eigenvalues gives the HOMO energy and the first
# value in the Alpha Virt. eigenfunction gives LUMO energy.
# HOMO
elif line.startswith("Alpha occ. eigenvalues --"):
alpha_occ_eigenvalues.extend(line.split()[4:])
# LUMO
elif line.startswith("Alpha virt. eigenvalues --"):
alpha_virt_eigenvalues.extend(line.split()[4:])
# rotational constants
elif line.startswith("Rotational constants (GHZ)"):
rotational_constants = line.split()[3:]
data_dict["rot_const_x"] = float(rotational_constants[0])
data_dict["rot_const_y"] = float(rotational_constants[1])
data_dict["rot_const_z"] = float(rotational_constants[2])
# symmetry point group
elif line.startswith("Full point group"):
data_dict["point_group"] = line.split()[3]
# nuclear repulsion energy in Hartrees
elif line.startswith("nuclear repulsion energy"):
nuclear_repulsion_energies.append(line.split()[-2])
# R6Disp: Grimme-D2 Dispersion energy in Hartrees
elif line.startswith("R6Disp: Grimme-D2 Dispersion energy"):
dispersion_energies.append(line.split()[-2])
# nuclear repulsion after empirical dispersion term
elif line.startswith("Nuclear repulsion after empirical dispersion term"):
nuclear_repulsion_dispersion.append(line.split()[-2])
# PCM non-electrostatic energy
elif line.startswith("PCM non-electrostatic energy"):
data_dict["PCM_non_electrostatic_energy"] = float(line.split()[-2])
# nuclear repulsion after PCM non-electrostatic terms
elif line.startswith("Nuclear repulsion after PCM non-electrostatic terms"):
data_dict["nuclear_repulsion_after_pcm"] = float(line.split()[-2])
# KE, PE and EE
elif line.startswith("KE="):
data_dict["KE"] = float(line.split()[1].replace("D", "e"))
data_dict["PE"] = float(line.split()[2].split("=")[-1].replace("D", "e"))
data_dict["EE"] = float(line.split()[-1].replace("D", "e"))
# SMD-CDS (non-electrostatic) energy, kcal/mol
elif line.startswith("SMD-CDS (non-electrostatic) energy"):
data_dict["SMD-CDS"] = float(line.split()[-1])
# GePol: Number of generator spheres
elif line.startswith("GePol: Number of generator spheres"):
data_dict["GePol_num_gen_spheres"] = int(line.split()[-1])
# GePol: Total number of spheres
elif line.startswith("GePol: Total number of spheres "):
data_dict["GePol_total_num_spheres"] = int(line.split()[-1])
# GePol: Number of exposed spheres
elif line.startswith("GePol: Number of exposed spheres"):
data_dict["GePol_num_exposed_spheres"] = int(re.split("=|\(", line)[1])
# GePol: Number of points
elif line.startswith("GePol: Number of points ="):
data_dict["GePol_num_points"] = int(line.split()[-1])
# GePol: Average weight of points
elif line.startswith("GePol: Average weight of points"):
data_dict["GePol_average_weight"] = float(line.split()[-1])
# GePol: Minimum weight of points
elif line.startswith("GePol: Minimum weight of points"):
data_dict["GePol_minimum_weight"] = float(line.split()[-1].replace("D", "e"))
# GePol: Minimum weight of points
elif line.startswith("GePol: Maximum weight of points"):
data_dict["GePol_maximum_weight"] = float(line.split()[-1].replace("D", "e"))
# GePol: Number of points with low weight
elif line.startswith("GePol: Number of points with low weight"):
data_dict["GePol_num_points_low_weight"] = int(line.split()[-1])
# GePol: Fraction of low-weight points (<1% of avg)
elif line.startswith("GePol: Fraction of low-weight"):
data_dict["GePol_frac_low_weight"] = float(line.split()[-1].strip('%')) / 100
# GePol: Cavity surface area, ang**2
elif line.startswith("GePol: Cavity surface area"):
data_dict["GePol_cavity_surface"] = float(line.split()[-2])
# GePol: Cavity volume, ang ** 3
elif line.startswith("GePol: Cavity volume"):
data_dict["GePol_cavity_volume"] = float(line.split()[-2])
data_dict["electro_spat_ext"] = electro_spat_ext[-1]
data_dict["HOMO"] = float(alpha_occ_eigenvalues[-1])
data_dict["LUMO"] = float(alpha_virt_eigenvalues[0])
data_dict["grimme_D2_dispersion_energy"] = float(dispersion_energies[-1])
data_dict["nuclear_repulsion_energy"] = float(nuclear_repulsion_energies[-1])
data_dict["nuclear_repulsion_dispersion"] = float(nuclear_repulsion_dispersion[-1])
if tag is not None:
data_dict = {k + "_" + tag: v for (k, v) in data_dict.items()}
df = pd.DataFrame(data_dict, index=[0])
if output_fname:
if output_fname.endswith(".csv"):
df.to_csv(output_fname, sep=",", index=None)
elif output_fname.endswith(".xlsx") or output_fname.endswith(".xls"):
df.to_excel(output_fname, index=None)
return data_dict, df |
@fwmeng88 This would be a welcome addition! I have a small question, bu it is likely not an issue: long chained def func1(line):
# do something with line
...
def func2(line):
# do something different with line
...
funcs = {"begin1": func1, "begin2": func2}
funcs[line[:6]](line) This would not have a cost that scales linearly with the number of such functions. |
Thanks for the suggestions. @tovrstra According to my usage experience, this parsing is fast, ~5-10 seconds. I think I am just going to follow this style but can fix it when it becomes a bottleneck. |
I am working on a project that requires the following information from geometry optimization result (log file) using Gaussian 16,
and the following from SCRF job,
I know you are working heavily on database construction that you may have some unmerged implementation for this already. To just avoid duplicated work, can you share your work if you have some already? @leila-pujal @FarnazH Thanks!
Otherwise, I can try to implement this feature and try to merge it into
IOData
.test_opt.log
test_scrf.log
The text was updated successfully, but these errors were encountered: