Hacker Newsnew | past | comments | ask | show | jobs | submit | __bee's commentslogin

I think I am able to develop end-to-end ML projects. However, my question was about building a career in this space, how to move from junior to senior, from senior to being an expert in the field.


I am a big fan of allenai :p


How Cython is compared to other langugaes such Go/Rust ? Any benchmarks out there .


Cython is compiled C that uses CPythons objects. If you can distill your algorithm to a full C(ython) implementation, you get CPython objects + C code, which is then compiled with the (appropriate version of the) system compiler.

So for example, from this little Cython code:

    def cy(int x):
        return x + 1
You get the following C code:

    /* Python wrapper */
    static PyObject *__pyx_pw_6hworld_1cy(PyObject *__pyx_self, PyObject *__pyx_arg_x); /*proto*/
    static PyMethodDef __pyx_mdef_6hworld_1cy = {"cy", (PyCFunction)__pyx_pw_6hworld_1cy, METH_O, 0};
    static PyObject *__pyx_pw_6hworld_1cy(PyObject *__pyx_self, PyObject *__pyx_arg_x) {
      int __pyx_v_x;
      PyObject *__pyx_r = 0;
      __Pyx_RefNannyDeclarations
      __Pyx_RefNannySetupContext("cy (wrapper)", 0);
      assert(__pyx_arg_x); {
        __pyx_v_x = __Pyx_PyInt_As_int(__pyx_arg_x); if (unlikely((__pyx_v_x == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 1, __pyx_L3_error)
      }
      goto __pyx_L4_argument_unpacking_done;
      __pyx_L3_error:;
      __Pyx_AddTraceback("hworld.cy", __pyx_clineno, __pyx_lineno, __pyx_filename);
      __Pyx_RefNannyFinishContext();
      return NULL;
      __pyx_L4_argument_unpacking_done:;
      __pyx_r = __pyx_pf_6hworld_cy(__pyx_self, ((int)__pyx_v_x));

      /* function exit code */
      __Pyx_RefNannyFinishContext();
      return __pyx_r;
    }

    static PyObject *__pyx_pf_6hworld_cy(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_x) {
      PyObject *__pyx_r = NULL;
      __Pyx_RefNannyDeclarations
      __Pyx_RefNannySetupContext("cy", 0);
    /* … */
      /* function exit code */
      __pyx_L1_error:;
      __Pyx_XDECREF(__pyx_t_1);
      __Pyx_AddTraceback("hworld.cy", __pyx_clineno, __pyx_lineno, __pyx_filename);
      __pyx_r = NULL;
      __pyx_L0:;
      __Pyx_XGIVEREF(__pyx_r);
      __Pyx_RefNannyFinishContext();
      return __pyx_r;
    }
    /* … */
      __pyx_tuple_ = PyTuple_Pack(2, __pyx_n_s_x, __pyx_n_s_x); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 1, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_tuple_);
      __Pyx_GIVEREF(__pyx_tuple_);
    /* … */
      __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6hworld_1cy, NULL, __pyx_n_s_hworld); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      if (PyDict_SetItem(__pyx_d, __pyx_n_s_cy, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_XDECREF(__pyx_r);
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_x + 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_r = __pyx_t_1;
      __pyx_t_1 = 0;
      goto __pyx_L0;
That compiles and becomes part of a C-extension that you can load with "import module" from within Python.


Good paper, I was wondering what is the state-of-art of using Neural Networks for Text Segmentation, Text Lemmatisation, Part-of-speech Tagging. Morphological approaches is dominant in this space.


I found RSS feeds to be better alternative to read the news beyond the filtering bubble that our social media platforms create.

There was an interesting tool to monitor RSS list of newspapers[1] on HN sometime ago. I wish that this tool [2] is hosted somewhere to easily get notification on my slack without setting it up or managing it. With load of information we face everyday, the idea of monitoring RSS feeds through Slack interface is very interesting.

[1] https://github.com/tzano/wren/blob/master/wren/config/rss_fe...

[2] https://github.com/tzano/wren


Hate to be that person, but -

This is not the spirit that some of us want to see in HN. Efforts are always appreciated. The field is improving incrementally, in a very fast way.


How about SageMaker, Can we include it in this list. I played with SageMaker sometime ago and it helps you build a whole pipeline to host your models, in addition to host your notebook and bridge the gap between data scientists and data engineers.


Anecdotally, we considered using the hosted versions of Jupyter and Apache Zeppelin that are part of AWS SageMaker and EMR. We couldn't figure out a simple/familiar workflow for keeping the notebooks under version control. So, we agreed to run the notebooks locally, use a familiar Git-based workflow, and interact with the AWS infrastructure through the local notebook instances.


Does Zeppelin work naturally with git? I've been struggling to get the right setup with just Jupyter


Well, good question. The file format for Jupyter is not ideal for 'code craftsmanship', as pointed out by another comment. There are utilities to strip out some of the metadata from the Jupyter files, such as rendered output and run counters, but that is a trade-off to be decided by your team:

https://github.com/kynan/nbstripout


and .. we will start seeing Linkedin ads on Github.


You don't need all of this. All what you need is to request your data from twitter (Your Tweet archive > https://twitter.com/settings/account). Iterate through the csv file and use tweet_id to unlike, remove or do what you want through their Twitter API.

Source: I have done it before, and it took less time/work than what you have stated.


When you say "all of this", that's only true for the browser-scraping part. You still need to use the API's CreateFavorite and DestroyFavorite calls on old tweets.

(As discussed, purely calling DestroyFavorite won't work on Tweets outside the 3200-tweet-capped API-accessible data store).


It seemed obvious to me that this is what he meant.


With the exception of the data retrieval method, the OP tried this, and suggests that simply having the tweet_id is not enough, if the tweet it corresponds to happens to be old enough (or something) to not be accessible by the API.


I did exactly this a while ago (before deleting my Twitter account for good) and with the id extracted from the downloaded CSV I could delete everything. Perhaps they've changed policies recently.

I wrote a couple of Python scripts to keep your timeline tidy (delete everything from the beginning, then trim and leave only the last N): https://github.com/rinze/obliterate_tweets


Twitter API allows you to get 3200 old tweets. That's not enough to delete all the tweets. If you request your data, you can get all your tweets.

>> tweet_id is not enough

tweet_id is enough to do any interaction if you give read/write to the key that you are using in your Twitter API. I deleted my tweets (back to 2012).


but it doesn't unlike. You have to re-like/un-like it now, too, beyond 3200.


Did anyone try Snips (https://snips.ai/), the open source version of Google Home/Amazon Echo ?


I hope like hell that snips succeeds. This IoT stuff really needs to be open source. I don't want to fill my home with sensors that I don't control, and I'm technical enough to deploy some home brew stuff, but there's just not a whole lot out there as far as open source smart devices

edit: Can anyone name some other privacy respecting, non-cloud, open source platforms and devices to work with?


To somewhat answer my own question, snips has a pretty active community which should hopefully be a good entree into the ecosystem of privacy-friendly hardware projects (discord, twitter, and https://github.com/snipsco/awesome-snips#community-projects)


If you want to try out, you can order a maker kit (RPI + microphone + speaker), https://makers.snips.ai/kit/

I have received mine yesterday but haven't had time to play around with it yet.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: