The mssql module contains a variety of functions to interact with MSSQL databases through Python and Pandas.
Reading tables
-
pdsql.mssql.
rd_sql
(server, database, table=None, col_names=None, where_in=None, where_op='AND', geo_col=False, from_date=None, to_date=None, date_col=None, rename_cols=None, stmt=None, con=None)[source] Function to import data from an MSSQL database.
- Parameters
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
col_names (list of str) – The column names that should be retrieved. e.g.: [�SiteID’, �BandNo’, �RecordNo’]
where_in (dict) – A dictionary of strings to lists of strings.’. e.g.: {�SnapshotType’: [�value1’, �value2’]}
where_op (str) – If where_in is a dictionary and there are more than one key, then the operator that connects the where statements must be either �AND’ or �OR’.
geo_col (bool) – Is there a geometry column in the table?.
from_date (str) – The start date in the form �2010-01-01’.
to_date (str) – The end date in the form �2010-01-01’.
date_col (str) – The SQL table column that contains the dates.
rename_cols (list of str) – List of strings to rename the resulting DataFrame column names.
stmt (str) – Custom SQL statement to be directly passed to the database. This will ignore all prior arguments except server and database.
con (SQLAlchemy connectable (engine/connection) or database string URI) – The sqlalchemy connection to be passed to pandas.read_sql
- Returns
- Return type
DataFrame
-
pdsql.mssql.
rd_sql_ts
(server, database, table, groupby_cols, date_col, values_cols, resample_code=None, period=1, fun='mean', val_round=3, where_in=None, where_op='AND', from_date=None, to_date=None, min_count=None, con=None)[source] Function to specifically read and possibly aggregate time series data stored in MSSQL tables.
- Parameters
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
groupby_cols (str or list of str) – The columns in the SQL table to grouped and returned with the time series data.
date_col (str) – The date column in the table.
values_cols (str or list of str) – The column(s) of the value(s) that should be resampled.
resample_code (str or None) – The Pandas time series resampling code. e.g. �D’ for day, �W’ for week, �M’ for month, etc.
period (int) – The number of resampling periods. e.g. period = 2 and resample = �D’ would be to resample the values over a 2 day period.
fun (str) – The resampling function. i.e. mean, sum, count, min, or max. No median yet…
val_round (int) – The number of decimals to round the values.
where_in (dict) – A dictionary of strings to lists of strings.’. e.g.: {�SnapshotType’: [�value1’, �value2’]}
where_op (str) – If where_in is a dictionary and there are more than one key, then the operator that connects the where statements must be either �AND’ or �OR’.
from_date (str) – The start date in the form �2010-01-01’.
to_date (str) – The end date in the form �2010-01-01’.
min_count (int) – The minimum number of values required to return groupby_cols. Only works when groupby_cols and vlue_cols are str.
con (SQLAlchemy connectable (engine/connection) or database string URI) – The sqlalchemy connection to be passed to pandas.read_sql
- Returns
Pandas DataFrame with MultiIndex of groupby_cols and date_col
- Return type
DataFrame
-
pdsql.mssql.
rd_sql_geo
(server, database, table, col_stmt, where_lst=None)[source] Function to extract the geometry and coordinate system from an SQL geometry field. Returns a shapely geometry object and a proj4 str.
- Parameters
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
where_lst (list) – A list of where statements to be passed and added to the final SQL statement.
- Returns
list of shapely geometry objects – The main output is a list of shapely geometry objects for all queried rows of the SQL table.
str – The second output is a proj4 str of the projection system.
Creating tables
-
pdsql.mssql.
create_table
(server, database, table, dtype_dict, primary_keys=None, foreign_keys=None, foreign_table=None, drop_table=False, con=None)[source] Function to create a table in an mssql database.
- Parameters
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
dtype_dict (dict of str) – Dictionary of df columns to the associated sql data type. Examples below.
primary_keys (str or list of str) – Index columns to define uniqueness in the data structure.
foreign_keys (str or list of str) – Columns to link to another table in the same database.
foreign_table (str) – The table in the same database with the identical foreign key(s).
drop_table (bool) – If the table already exists, should it be dropped?
- Returns
- Return type
Writing to tables
-
pdsql.mssql.
to_mssql
(df, server, database, table, index=False, dtype=None, schema=None)[source] Function to append a DataFrame onto an existing mssql table.
- Parameters
df (DataFrame) – DataFrame to be saved. The DataFrame column/index names must match those on the mssql table exactly.
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
index (bool) – Should the index be added as a column?
dtype (dict of column name to SQL type, default None) – Optional specifying the datatype for columns. The SQL type should be an SQLAlchemy type.
- Returns
- Return type
Updating tables
-
pdsql.mssql.
update_table_rows
(df, server, database, table, on=None, index=False, append=True)[source] Function to update rows from an mssql table. SQL table must have a primary key and the primary key must be in the input DataFrame.
- Parameters
df (DataFrame) – DataFrame with data to be overwritten in SQL table.
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
on (None, str, or list of str) – The index by which the update should be applied on. If None, then it uses the existing primary key(s).
index (bool) – Does the df have an index that corresponds to the SQL table primary keys?
append (bool) – Should new sites be appended to the table?
- Returns
- Return type
-
pdsql.mssql.
update_from_difference
(df, server, database, table, on=None, index=False, append=True, mod_date_col=False)[source] Function to update rows from an mssql table from the difference between a DataFrame and the existing SQL table.
- Parameters
df (DataFrame) – DataFrame with data to be overwritten in SQL table.
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
on (None, str, or list of str) – The index by which the update should be applied on. If None, then it uses the existing primary key(s).
index (bool) – Does the df have an index that corresponds to the SQL table primary keys?
append (bool) – Should new sites be appended to the table?
mod_date_col (str or None) – Name of the modification date column to be updated. None if it doesn’t exist.
- Returns
Of the results that were updated in SQL.
- Return type
DataFrame
Deleting rows in tables
-
pdsql.mssql.
del_table_rows
(server, database, table=None, pk_df=None, stmt=None)[source] Function to selectively delete rows from an mssql table.
- Parameters
server (str) – The server name. e.g.: �SQL2012PROD03’
database (str) – The specific database within the server. e.g.: �LowFlows’
table (str or None if stmt is a str) – The specific table within the database. e.g.: �LowFlowSiteRestrictionDaily’
pk_df (DataFrame) – A DataFrame of the primary keys of the table for the rows that should be removed.
stmt (str) – SQL delete statement. Will override everything except server and database.
- Returns
- Return type
Notes
Using the pk_df is the only way to ensure that specific rows will be deleted from composite keys. The column data types and names of pk_df must match the equivelant columns in the SQL table. The procedure creates a temporary table from the pk_df then deletes the rows in the target table based on the temp table. Then finally deletes the temp table.
Helper functions
-
pdsql.mssql.
sql_where_stmts
(where_in=None, where_op='AND', from_date=None, to_date=None, date_col=None)[source] Function to take various input parameters and convert them to a list of where statements for SQL.
- Parameters
where_in (dict) – Either a str with an associated where_val list or a dictionary of string keys to list values. If a str, it should represent the table column associated with the �where’ condition.
where_in – A dictionary of strings to lists of strings.’. e.g.: {�SnapshotType’: [�value1’, �value2’]}
where_op (str of either 'AND' or 'OR') – The binding operator for the where conditions.
from_date (str or None) – The start date in the form �2010-01-01’.
to_date (str or None) – The end date in the form �2010-01-01’.
date_col (str or None) – The SQL table column that contains the dates.
- Returns
Returns a list of str where conditions to be passed to an SQL execution function. The function needs to bind it with ” where ” + ” and “.join(where_lst)
- Return type
list of str or None
-
pdsql.mssql.
sql_ts_agg_stmt
(table, groupby_cols, date_col, values_cols, resample_code, period=1, fun='mean', val_round=3, where_lst=None)[source] Function to create an SQL statement to pass to an SQL driver to resample a time series table.
- Parameters
table (str) – The SQL table name.
groupby_cols (str or list of str) – The columns in the SQL table to grouped and returned with the time series data.
date_col (str) – The date column in the table.
values_cols (str or list of str) – The column(s) of the value(s) that should be resampled.
resample_code (str) – The Pandas time series resampling code. e.g. �D’ for day, �W’ for week, �M’ for month, etc.
period (int) – The number of resampling periods. e.g. period = 2 and resample = �D’ would be to resample the values over a 2 day period.
fun (str) – The resampling function. i.e. mean, sum, count, min, or max. No median yet…
val_round (int) – The number of decimals to round the values.
where_lst (list or None) – A list of where statements to be passed and added to the final SQL statement.
- Returns
A full SQL statement that can be passed directly to an SQL connection driver through pandas read_sql function.
- Return type