用多线程优化Excel表格数据导入校验的接口

时间：2019-08-03 00:16:57 阅读：106 评论：0 收藏：0 [点我收藏+]

公司的需求，当前某个Excel导入功能，流程是：读取Excel数据，传入后台校验每一条数据，判断是否符合导入要求，返回给前端，导入预览展示。（前端等待响应，难点）。用户再点击导入按钮，进行异步导入（前端不等待，好做）。当前接口仅支持300条数据，现在要求我要支持3000条数据。

解决问题，思路是关键。

首先，查看接口，找到读取表格的位置，看到判断，如果数据量大于300，直接返回。把300改成3000.

然后，分析导入数据校验，都是和哪些数据进行校验的，这些数据都是从数据库来的。每一次都从数据库查询，那肯定是慢的。就算是查询Redis缓存，也要有网络消耗，增加缓存的压力。虽然单机Redis有12万次/秒的查询性能，12万除以3000得40，如果这样玩，40个人使用就拖垮系统了。同一个数据，非要查3000次，那是不是傻？？？所以减少每一次的查询，把数据库查询都加上Redis缓存，把Redis缓存查到的数据，在方法中创建并发安全容器ConcurrentHashMap存储数据，避免重复的查询操作，只查一次直到方法调用结束。

  Map<String, Object> map = new ConcurrentHashMap();
        Object obj = map.get("key");
        if (null == obj){
            //查询缓存，或者数据库
            String value = "数据";
            map.put("key", value);
        }

方法内部创建的对象，当方法调用完成，进栈出栈，释放引用，就会释放内存。在3000次校验的过程中，Object对象，是在jvm内存中的，方便被快速的重复使用，而不是需要再次从数据库或者缓存中获取。这是方法栈级别的缓存，JVM缓存，本地缓存。

这就是最重要的思想，思维。做到一个方法中，尽量少的查询，把查询的结果重复利用。

当我做完了在方法中用ConcurrentHashMap缓存数据，就进行了测试。

结果：最多支持800条导入数据的校验。前端请求超过10秒，就会请求超时。

怎么办呢？？？

产品，你这个需求搞不定啊。无法实现啊。。。。。。扯皮中。。。。。扯皮无效。

接着用多线程技术进行优化。

1.创建线程池

import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * 线程池<br/>
 * 
 * 
 * @author 
 * @version 
 */
public class MyExecutor {

   /**
    * 在池中保持的线程的最小数量
    */
   private static final int CorePoolSize = 10;
   /**
    * 线程池中能容纳的最大线程数量，如果超出，则使用RejectedExecutionHandler拒绝策略处理
    */
   private static final int MaximumPoolSize = 200;
   /**
    * 线程的最大生命周期。这里的生命周期有两个约束条件：
    * 一：该参数针对的是超过corePoolSize数量的线程；
    * 二：处于非运行状态的线程。举个例子：如果corePoolSize（最小线程数）为10，maxinumPoolSize（最大线程数）为20，
    * 而此时线程池中有15个线程在运行，过了一段时间后，其中有3个线程处于等待状态的时间超过keepAliveTime指定的时间，
    * 则结束这3个线程，此时线程池中则还有12个线程正在运行。
    */
   private static final int KeepAliveTime = 30;
   /**
    * 等待任务队列大小
    */
   private static final int Capacity = 10000;
   
   private static final ExecutorService pool = new ThreadPoolExecutor(CorePoolSize, MaximumPoolSize, KeepAliveTime, TimeUnit.SECONDS, new LinkedBlockingDeque<>(Capacity));
   


   public static ExecutorService getPool(){
      return pool;
   }
   
}

2.创建用于接收线程池任务返回值有序集合，方便依次获取结果。

 List<Future<Object>> futureList = new LinkedList<>();

3.获取线程池

       //获取线程池
        ExecutorService pool = MyExecutor.getPool();

4.读取Excel表格数据，遍历每一行，每一行数据都提交一个任务到多线程。

        //提交Callable任务到线程池
        Future<Object> future  = pool.submit(new Callable<Object>() {
            @Override
            public Object call() throws Exception {
                // 每条数据的计算
                return null;
            }
        });

         //把单个结果加入有序集合中。
        futureList.add(future);

5.遍历futureList获取结果。

 for (Future<Object> oneFuture : futureList) {
            try {
                //每一个任务的结果，阻塞方法，一直等待到计算任务完成。
                Object result = oneFuture.get();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

6.如此，把所有结果组合起来，返回。就完成了这个方法的线程池运用的改造。

7.这时候，又出现一个问题，3000条数据，每条数据都有一个id，如何在多线程里，让处理过的id不重复，出现重复还能做标记呢？？？

这时候我用到了并发安全的Set ===> ConcurrentSkipListSet

     // 并发安全，去重复
        ConcurrentSkipListSet<Integer> idSet = new ConcurrentSkipListSet<>();

      boolean flag =idSet.add(id);
        if (!flag){
            //添加失败，说明数据重复。
        }

我们来看看ConcurrentSkipListSet的add()方法的源码：

    /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element {@code e} to this set if
     * the set contains no element {@code e2} such that {@code e.equals(e2)}.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns {@code false}.
     *
     * @param e element to be added to this set
     * @return {@code true} if this set did not already contain the
     *         specified element
     * @throws ClassCastException if {@code e} cannot be compared
     *         with the elements currently in this set
     * @throws NullPointerException if the specified element is null
     */
    public boolean add(E e) {
        return m.putIfAbsent(e, Boolean.TRUE) == null;
    }

把上面的描述内容用谷歌翻译：
如果指定的元素尚不存在，则将其添加到此集合中。
      更正式地说，将指定的元素{@code e}添加到此集合if
     该集合不包含{@code e2}元素，以便{@code e.equals（e2）}。
      如果此集合已包含该元素，则该调用将离开该集合
     不变并返回{@code false}

说明我们这里的id去重的用法完全正确。

我们再来看看Future的get()方法的源码：

    /**
     * Waits if necessary for the computation to complete, and then
     * retrieves its result.
     *
     * @return the computed result
     * @throws CancellationException if the computation was cancelled
     * @throws ExecutionException if the computation threw an
     * exception
     * @throws InterruptedException if the current thread was interrupted
     * while waiting
     */
    V get() throws InterruptedException, ExecutionException;

翻译：Waits if necessary for the computation to complete, and then retrieves its result .
等待计算完成所需，然后取回其结果

所以，Future的get()方法是阻塞等待的。

到此，我就完成了从开始的300条数据，到800条数据10秒响应，优化到了3000条数据7秒响应。

即完成了任务，又提高了性能。

通过这一次运用了，线程池，Future，Callable 和并发安全容器类ConcurrentHashMap、ConcurrentSkipListSet 等技术，

很大的提高了我的多线程，并发编程的技术。还有方法栈级别的数据缓存，JVM缓存，这是一个思想的飞跃。

用多线程优化Excel表格数据导入校验的接口

标签：不重复线程引用 rup 技术 his 拒绝策略针对数据库查询

原文地址：https://www.cnblogs.com/itbac/p/11291706.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行