Conversation
| -------- | ||
| .. literalinclude:: ../../../examples/dataframe/dataframe_loc.py | ||
| :language: python | ||
| :lines: 34- |
There was a problem hiding this comment.
It is better ':lines: 36-'
| "B": [3, 4, 1, 0, 222], | ||
| "C": [3.1, 8.4, 7.1, 3.2, 1]}, index=idx) | ||
| pd.testing.assert_series_equal(sdc_func(df), test_impl(df), check_names=False) | ||
|
|
There was a problem hiding this comment.
Add test with index not contained in indices DF.
| -------- | ||
| .. literalinclude:: ../../../examples/dataframe/dataframe_loc.py | ||
| :language: python | ||
| :lines: 34- |
There was a problem hiding this comment.
| :lines: 34- | |
| :lines: 36- |
| raise TypingError('Operator getitem(). The index must be a single label, a list or array of labels,\ | ||
| a slice object with labels, a boolean array or a callable. Given: {}'.format(idx)) |
There was a problem hiding this comment.
Is getitem() correct? Shouldn't we use this message info in limitations block in docstring?
There was a problem hiding this comment.
What's about this recommendation?
There was a problem hiding this comment.
This must be so that the user understands that he entered something incorrect
The same is done in the series.
There was a problem hiding this comment.
I meant:
- Actually the operator is not
getitem, it'sloc. The index must be a single label, a list or array of labels, a slice object with labels, a boolean array or a callablelooks like a limitation, doesn't?
There was a problem hiding this comment.
| raise TypingError('Operator getitem(). The index must be a single label, a list or array of labels,\ | |
| a slice object with labels, a boolean array or a callable. Given: {}'.format(idx)) | |
| ty_checker = TypeChecker('Operator loc().') | |
| ty_checker.raise_exc(idx, 'int', 'idx') |
I meant to insert limitations block to docstring as it was done in 0e1ce3a#diff-37d3d013a811f054d85ea0713b88b1eeR1723-R1731. However idx can be only of integer according to the code.
There was a problem hiding this comment.
I already do that
- Loc works with basic case only: single label
in limitations
There was a problem hiding this comment.
You still raise exception with incorrect message.
| if self._dataframe._index[i] == idx: | ||
| data_0 = pandas.Series(self._dataframe._data[0], index=self._dataframe.index) | ||
| result_0 = data_0.at[idx] | ||
| data_1 = pandas.Series(self._dataframe._data[1], index=self._dataframe.index) | ||
| result_1 = data_1.at[idx] | ||
| return pandas.Series(data=[result_0[0], result_1[0]], index=['A', 'B'], name=str(idx)) |
| func_lines = ['def _df_getitem_single_label_loc_impl(self, idx):', | ||
| ' for i in numba.prange(len(self._dataframe.index)):', | ||
| ' if self._dataframe._index[i] == idx:'] | ||
| if isinstance(self.index, types.NoneType): | ||
| func_lines = ['def _df_getitem_single_label_loc_impl(self, idx):', | ||
| ' if -1 < idx < len(self._dataframe._data):'] |
There was a problem hiding this comment.
You will have incorrect indentation if index is None:
if -1 < idx < len(self._dataframe._data): # 2 white spaces
data_0 =... # 6 white spaces| space = ' ' | ||
| if isinstance(self.index, types.NoneType): | ||
| func_lines = ['def _df_getitem_single_label_loc_impl(self, idx):', | ||
| ' if -1 < idx < len(self._dataframe._data):'] | ||
| space = '' | ||
| results = [] | ||
| result_index = [] | ||
| for i, c in enumerate(self.columns): | ||
| result_c = f"result_{i}" | ||
| func_lines += [f"{space} data_{i} = pandas.Series(self._dataframe._data[{i}], index=self._dataframe.index)", | ||
| f"{space} {result_c} = data_{i}.at[idx]"] | ||
| results.append(result_c) | ||
| result_index.append(c) | ||
| data = '[0], '.join(col for col in results) + '[0]' | ||
| func_lines += [f"{space} return pandas.Series(data=[{data}], index={result_index}, name=str(idx))", |
There was a problem hiding this comment.
Better to rename space to indent something like that:
| space = ' ' | |
| if isinstance(self.index, types.NoneType): | |
| func_lines = ['def _df_getitem_single_label_loc_impl(self, idx):', | |
| ' if -1 < idx < len(self._dataframe._data):'] | |
| space = '' | |
| results = [] | |
| result_index = [] | |
| for i, c in enumerate(self.columns): | |
| result_c = f"result_{i}" | |
| func_lines += [f"{space} data_{i} = pandas.Series(self._dataframe._data[{i}], index=self._dataframe.index)", | |
| f"{space} {result_c} = data_{i}.at[idx]"] | |
| results.append(result_c) | |
| result_index.append(c) | |
| data = '[0], '.join(col for col in results) + '[0]' | |
| func_lines += [f"{space} return pandas.Series(data=[{data}], index={result_index}, name=str(idx))", | |
| indent = ' ' * 6 | |
| if isinstance(self.index, types.NoneType): | |
| func_lines = ['def _df_getitem_single_label_loc_impl(self, idx):', | |
| ' if -1 < idx < len(self._dataframe._data):'] | |
| indent = ' ' * 4 | |
| results = [] | |
| result_index = [] | |
| for i, c in enumerate(self.columns): | |
| result_c = f"result_{i}" | |
| func_lines += [f"{indent}data_{i} = pandas.Series(self._dataframe._data[{i}], index=self._dataframe.index)", | |
| f"{indent}{result_c} = data_{i}.at[idx]"] | |
| results.append(result_c) | |
| result_index.append(c) | |
| data = '[0], '.join(col for col in results) + '[0]' | |
| func_lines += [f"{indent}return pandas.Series(data=[{data}], index={result_index}, name=str(idx))", |
| Limitations | ||
| ----------- | ||
| - Parameter ``'name'`` in new DataFrame can be String only | ||
| - Loc works with basic case only: single label |
There was a problem hiding this comment.
| - Loc works with basic case only: single label | |
| - Parameter ``idx`` is supported only to be a single value, e.g. :obj:`df.loc['A']`. |
|
|
||
| Limitations | ||
| ----------- | ||
| - Parameter ``'name'`` in new DataFrame can be String only |
There was a problem hiding this comment.
What does the parameter name mean?
There was a problem hiding this comment.
In this case it means that result series (if it series) has name string
Maybe need change limitation to more understanding
There was a problem hiding this comment.
What is difference between Pandas and SDC in case of name of the Series?
There was a problem hiding this comment.
Doesn't we support numeric name for the Series?
There was a problem hiding this comment.
Yes, we support only string name series
And if base series containe numeric name, we change it into string
| data_0 = [] | ||
| for i in numba.prange(len(idx_list)): | ||
| index_in_list_0 = idx_list[i] | ||
| data_0.append(self._dataframe._data[0][index_in_list_0]) |
There was a problem hiding this comment.
you can't do append in prange loop. Also you could use sdc_take. @kozlov-alexey
| def _df_getitem_single_label_loc_impl(self, idx): | ||
| idx_list = [] | ||
| for i in range(len(self._dataframe.index)): | ||
| if self._dataframe._index[i] == idx: |
There was a problem hiding this comment.
What would happen if _index is None?
Also it is better to do it in parallel. Split it into chunks, create list per chunk and then merge them
There was a problem hiding this comment.
What would happen if _index is None?
It is okay because I dont use dataframe._index in case of index = None
|
@1e-to conflict |
sdc/functions/numpy_like.py
Outdated
| if arr[j] == idx: | ||
| res += 1 | ||
| length += res | ||
| arr_len[i] = res |
There was a problem hiding this comment.
You could allocate list of list. The length of the first list is equal to number of chunks. In this case you could safely use append for list related to chunk. So in this case single loop would be enough. (and another loop (probably, not parallel) to merge all lists into one)
No description provided.