Entity Framework Core - .Contains() - Why escaping instead of parametrization?

asp.net-core c# entity-framework-core linq sql

Question

In a web application, I get user input in the form of a List<string> from the ViewModel and use this information to select the Ids of the users with the following code:

var selectedUsersIds = Context.Users
.Where(user => SelectedUsers.Contains(user.Email))
.Select(user => user.Id)
.ToList();

SelectedUsers is the list of strings (user emails).

Now, while viewing the application logs I encountered the following log entry:

info: Microsoft.Data.Entity.Storage.Internal.RelationalCommandBuilderFactory[1]
  Executed DbCommand (0ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
  SELECT [user].[Id]
  FROM [AspNetUsers] AS [user]
  WHERE [user].[Email] IN ('first@user.com', 'second@user.com')

So the next task was to use a rest client and use some@user'-- as form parameter and I got this result:

info: Microsoft.Data.Entity.Storage.Internal.RelationalCommandBuilderFactory[1]
  Executed DbCommand (1ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
  SELECT [user].[Id]
  FROM [AspNetUsers] AS [user]
  WHERE [user].[Email] IN ('some@user''--')

Here, the single ' got escaped to a double ''. This behaviour seems consistent with what is described in the docs under "Security Guarantee: LINQ queries use parameterization and escaping", which states that queries will either be parametrized or escaped.

I wonder, however, how is it decided when a query is parametrized and when to escape values? What reasons are there to chose either one? And I thought that escaping was never 100% safe, is this different now?

1
3
2/18/2016 2:37:24 PM

Accepted Answer

Great question. I believe the answer is it will be escaped if the element can be converted to a constant and it's not an UPDATE or INSERT. Here's why:

Looking at the SqlGenerator Souce shows a method called GenerateSql that looks like this:

internal static string GenerateSql(DbCommandTree tree, SqlVersion sqlVersion, out List<SqlParameter> parameters, out CommandType commandType, out HashSet<string> paramsToForceNonUnicode)
{
    SqlGenerator sqlGen;
    commandType = CommandType.Text;
    parameters = null;
    paramsToForceNonUnicode = null;

    switch (tree.CommandTreeKind)
    {
        case DbCommandTreeKind.Query:
            sqlGen = new SqlGenerator(sqlVersion);
            return sqlGen.GenerateSql((DbQueryCommandTree)tree, out paramsToForceNonUnicode);

        case DbCommandTreeKind.Insert:
            return DmlSqlGenerator.GenerateInsertSql((DbInsertCommandTree)tree, sqlVersion, out parameters);

        case DbCommandTreeKind.Delete:
            return DmlSqlGenerator.GenerateDeleteSql((DbDeleteCommandTree)tree, sqlVersion, out parameters);

        case DbCommandTreeKind.Update:
            return DmlSqlGenerator.GenerateUpdateSql((DbUpdateCommandTree)tree, sqlVersion, out parameters);

        case DbCommandTreeKind.Function:
            sqlGen = new SqlGenerator(sqlVersion);
            return GenerateFunctionSql((DbFunctionCommandTree)tree, out commandType);

        default:
            //We have covered all command tree kinds
            Debug.Assert(false, "Unknown command tree kind");
            parameters = null;
            return null;
    }
}

As you can see if it is a query it returns generated SQL without params. For other kinds, it will populate the List<SqlParameter>.

For whether it will handle a constant, we can look elsewhere in the same class:

There is a comment here that says:

// Constants will be sent to the store as part of the generated TSQL, not as parameters

We've already determined that for INSERT or UPDATE or DELETE, it will use parameters. So, this is for queries. Your List<string> is converted to constants when it's passed in as you saw from the logs. So, we just want to know what happens to those strings.

Then a big switch on the type happens, the relevant part of which is

case PrimitiveTypeKind.String:
    bool isUnicode;

    if (!TypeHelpers.TryGetIsUnicode(e.ResultType, out isUnicode))
    {
        // If the unicode facet is not specified, if needed force non-unicode, otherwise default to unicode.
        isUnicode = !_forceNonUnicode;
    }
    result.Append(EscapeSingleQuote(e.Value as string, isUnicode));
    break;

Which does a simple

private static string EscapeSingleQuote(string s, bool isUnicode)
{
    return (isUnicode ? "N'" : "'") + s.Replace("'", "''") + "'";
}

As to your other questions

What reasons are there to chose either one? And I thought that escaping was never 100% safe, is this different now?

I'd tend to agree with "escaping cannot be 100% safe" although I'm not a security expert, and someone could probably point you at some limited scope, 100% safe method. I would only choose escaping when I was 100% sure that there's nothing funny going on: that is, when the values can't possibly come straight from a user. As such, you may want to do some more testing on your implementation and decide if you need to secure it further, based on other factors such as access limitation, risk tolerance, sensitivity of data, and user roles.

2
2/18/2016 3:22:05 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow