Let us worry about your assignment instead!

We Helped With This Python Programming Homework: Have A Similar One?

SOLVED
CategoryProgramming
SubjectPython
DifficultyUndergraduate
StatusSolved
More InfoPython Programming Help
631011

Short Assignment Requirements

Naive Bayes classifiers are among the most successful known algorithms for learning to classify text documents. The primary technical objective of this assignment is to provide an implementation of a Multinomial Naive Bayes learning algorithm in Python for classify tweets.Please see file attached for full project specification.

Assignment Description

               

The        objective             of            this         project is             to            build      a              Bayesian              classifier                that        predicts                the         sentiment           of            tweets.               

Due   Date: 

Assignment     should  be           submitted           to            Backboard           before  10pm     Tuesday             October               23rd.                      

            

Assignment Marks:           

The      distribution         of            marks    is             as            follows:

1.       Naive Bayes    Algorithm           50%       

2.       A          Basic      Evaluation          10%       

3.       Research           and        Detailed              Evaluation          40%       

Parts   2              and        3              above   will         be           submitted           as            a              report   along             with       your       Python code      from      part        1.                           

The      report   should  consist  of            two        parts:   

(i)                  The             basic      evaluation           should  describe               the         basic      methods     you        have      employed           for          cleaning              

the              dataset (for        example,             converting          everything          to            lower-case,          removal               of            punctuation,      etc).       It             should  also        provide     an           account of            the         performance     of            the         model   and        how       it     was        impacted             by           the         basic      methods              of            cleaning     the         data.                     

    

(ii)                The             research              and        detailed               evaluation           of            the         algorithm     should  investigate          the         impact  of           

more          advanced            pre-processing  techniques         on           the         classification     accuracy               of            your       Naïve    Bayes    classifier.             Remember         you     should  test        your       algorithm             using     data       that        was        not         used     to            train       the         algorithm             in            the         first        place.    The        research     element               allows   you        to            explore and        report   on           various efforts     you        have      made    to            improve               the         classification       accuracy     of            the         algorithm.                          

         

Objectives  

Naive  Bayes    classifiers             are         among  the         most      successful           known  algorithms             for          learning                to            classify  text        documents.        The        primary technical             objective             of            this         assignment         is             to            provide an           implementation             of            a              Multinomial      Naive    Bayes    learning                algorithm             in            Python             for          classify  tweets.               

            

On       Blackboard          you        will         find        two        files        (train.csv             and        test.csv).             Both      files        include the         following             columns:             

1.       id        

2.       Positive             (1)          and        negative              (0)          label      for          a              given     tweet   

3.       Source:              Sentiment140   

4.       Text                   

Once   you        have      trained your       model   you        should  assess   the         accuracy               of             your       model   using     the         test        dataset.              

Naïve  Bayes    will         treat      the         presence             of            each      word     as            a              single             feature/attribute.            This        would   give        you        as            many     features               as             there     are         words   in            your       vocabulary.         You        should  use         a              “bag             of            words” (Multinomial      model) approach.            The        Multinomial        model   places             emphasis             on           the         frequency           of            occurrence         of            a              word             within   documents         of            a              class       (See       Week    4              lecture  slides     for             more     details   and        examples).        

            

Stage  1              –Vocabulary      Composition      and        Word    Frequency          Calculations      

Develop            code      for          reading all            tweets  from      both      the         positive               and        negative             files.                     

            

You      should  initially  create   a              data       structure             to            store     all            unique  words             in            a              vocabulary.         A             set          data       structure             in            Python is             ideal      for          this         purpose.              You        can         keep      adding  lists        of            words             to            the         set          and        it             will         only       retain    unique  words.                 

            

Your    next       step       is             to            record   the         frequency           with       which    words   occur             in            both      the         positive and        negative              tweets. I               recommend       that             you        use         dictionaries         to            store     the         frequency           of            each      word.             (Note    the         keys       of            each      dictionary            should  correspond         to            all             words   in            the         vocabulary          and        the         values   should  specify  how       often             they       occur     for          that        class).    For         example,             if             the         word     “brilliant”             occurs   55           times     in            the         positive tweets  then      the         key         value     pair             in            your       positive dictionary            should  be           <”brilliant”          :               55>.       You             need     to            record   the         frequency           of            all            the         words   for          each             class       (positive              and        negative).                          

            

It          can         be           useful   when    initially  creating                the         positive or            negative             dictionary            to            use         the         values   from      the         set          (which  contains             all            your       unique  words)  to            initialize                all            the         keys       for          the             dictionary.           See        example              code      below: 

            

                                                                                                          #   this         line         creates a              dictionary,           which    is                                                                                                  initialized           so           that       

#each key         is             a              value     from      the         set          vocab                 negDict =             dict.fromkeys(vocab,     0)           

            

            

            

            

Stage  2              –             Calculating         Word    Probability         Calculations      

Once   you        have      populated           your       positive and        negative              dictionary            with             the         frequency           of            each      word,    you        must      then      work      out         the             conditional          probabilities       for          all            words   (for        each      class).    In            other             words   for          each      word     w            you        should  work      out         the         P(w|positive)             and        P(w|negative). Refer     to            Week    3              lecture  notes    for          more             information.       Remember         this         is             a              multinomial        model. 

Stage  3              –             Classifying          Unseen Tweets and        Performing         Basic      Evaluation         

The      final       section of            your       code      will         take       as            input     a              new       tweet             (a            tweet    that        has         not         been     used      for          training the         algorithm)             and        classify  the         tweet    as            a              positive or            negative              review. You             will         need     to            read       all            words   from      the         tweet    and        determine             the         probability          of            that        tweet    being     positive and        the         probability             of            it             being     negative.            

For       the         basic      evaluation           of            your       algorithm             you        should  run         all             tweets  from      the         test        folder    through your       algorithm             and        determine             the         level      of            accuracy               (the       percentage         of            tweets  correctly             classified              for          each      class).                   

You      should  also        try          to            clean     the         dataset by           lower-casing      all            words             and        removing             punctuation       as            much     as            possible.              Your       basic             evaluation           should  describe               the         basic      steps     you        took       and        if             any         impact  on           accuracy               was        observed.                          

            

Research           and        Detailed              Evaluation         

The      research              aspect   of            this         project is             worth    40%.      You        should  research             common              methods              used      for          potentially          improving            the             classification       accuracy               of            your       Naïve    Bayes    algorithm.           Please   note             that        basic      techniques         such       as            lowering              the         case       of            all             words   and        punctuations     removal               will         not         be           considered.        Your             report   should  provide a              detailed               account of            the         research,             the             subsequent        implementation               as            well        as            the         updated               results.             You        should  cite         all            sources you        used.     Please   note      that        you        will             not         be           docked marks    for          techniques         that        do           you        improve             accuracy.                            

            

The      regular  expression          library   in            Python may       prove    useful   in            performing             pre-processing  techniques.        (re          module https://docs.python.org/3.6/library/re.html                                     ).             This        provides              capabilities          for          extracting            whole   words             and        removing             punctuation.      See        example              on           the         next       page.             You        can         find        a              tutorial on           regular  expression          at             https://developers.google.com/edu/python/regular-expressions               .              

An        alternative          is             the         use         of            NLTK,    Pythons               natural  language             toolkit   (http://nltk.org/               ).             Note      to            use         this         from      Spyder  you             will         need     to            run         nltk.download('all').        It             is             a              power   library             that        provides              a              range    of            capabilities          including              stemming,             lemmatization,  etc.                       

            

            

Frequently Asked Questions

Is it free to get my assignment evaluated?

Yes. No hidden fees. You pay for the solution only, and all the explanations about how to run it are included in the price. It takes up to 24 hours to get a quote from an expert. In some cases, we can help you faster if an expert is available, but you should always order in advance to avoid the risks. You can place a new order here.

How much does it cost?

The cost depends on many factors: how far away the deadline is, how hard/big the task is, if it is code only or a report, etc. We try to give rough estimates here, but it is just for orientation (in USD):

Regular homework$20 - $150
Advanced homework$100 - $300
Group project or a report$200 - $500
Mid-term or final project$200 - $800
Live exam help$100 - $300
Full thesis$1000 - $3000

How do I pay?

Credit card or PayPal. You don't need to create/have a Payal account in order to pay by a credit card. Paypal offers you "buyer's protection" in case of any issues.

Why do I need to pay in advance?

We have no way to request money after we send you the solution. PayPal works as a middleman, which protects you in case of any disputes, so you should feel safe paying using PayPal.

Do you do essays?

No, unless it is a data analysis essay or report. This is because essays are very personal and it is easy to see when they are written by another person. This is not the case with math and programming.

Why there are no discounts?

It is because we don't want to lie - in such services no discount can be set in advance because we set the price knowing that there is a discount. For example, if we wanted to ask for $100, we could tell that the price is $200 and because you are special, we can do a 50% discount. It is the way all scam websites operate. We set honest prices instead, so there is no need for fake discounts.

Do you do live tutoring?

No, it is simply not how we operate. How often do you meet a great programmer who is also a great speaker? Rarely. It is why we encourage our experts to write down explanations instead of having a live call. It is often enough to get you started - analyzing and running the solutions is a big part of learning.

What happens if I am not satisfied with the solution?

Another expert will review the task, and if your claim is reasonable - we refund the payment and often block the freelancer from our platform. Because we are so harsh with our experts - the ones working with us are very trustworthy to deliver high-quality assignment solutions on time.

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On

soc fb soc insta


Paypal supported